postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-28 17:49:35 -04:00

Author	SHA1	Message	Date
Tom Lane	dc7521dcb9	Fix planner's handling of RETURNING lists in writable CTEs. setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists, which meant they were left with the wrong varnos when the RETURNING list was in a subquery. That was never possible before writable CTEs, of course, but now it's broken. The executor fails to notice any problem because ExecEvalVar just references the ecxt_scantuple for any normal varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a recent complaint from Bartosz Dmytrak. Since the eventual rtoffset of the subquery is not known at the time we are preparing its plan node, the previous scheme of executing set_returning_clause_references() at that time cannot handle this adjustment. Fortunately, it turns out that we don't really need to do it that way, because all the needed information is available during normal setrefs.c execution; we just have to dig it out of the ModifyTable node. So, do that, and get rid of the kluge of early setrefs processing of RETURNING lists. (This is a little bit of a cheat in the case of inherited UPDATE/DELETE, because we are not passing a "root" struct that corresponds exactly to what the subplan was built with. But that doesn't matter, and anyway this is less ugly than early setrefs processing was.) Back-patch to 9.1, where the problem became possible to hit.	2012-04-25 20:20:43 -04:00
Robert Haas	f8c53d5b41	Fix copyfuncs/equalfuncs support for ReassignOwnedStmt. Noah Misch	2012-04-18 10:46:26 -04:00
Heikki Linnakangas	10fcfada23	Don't wait for the commit record to be replicated if we wrote no WAL. When using synchronous replication, we waited for the commit record to be replicated, but if we our transaction didn't write any other WAL records, that's not required because we don't even flush the WAL locally to disk in that case. This lead to long waits when committing a transaction that only modified a temporary table. Bug spotted by Thom Brown.	2012-04-17 16:36:59 +03:00
Tom Lane	79853989c5	Clamp indexscan filter condition cost estimate to be not less than zero. cost_index tries to estimate the per-tuple costs of evaluating filter conditions (a/k/a qpquals) by subtracting the estimated cost of the indexqual conditions from that of the baserestrictinfo conditions. This is correct so long as the indexquals list is a subset of the baserestrictinfo list. However, in the presence of derived indexable conditions it's completely wrong, leading to bogus or even negative scan cost estimates, as seen for example in bug #6579 from Istvan Endredy. In practice the problem isn't severe except in the specific case of a LIKE optimization on a functional index containing a very expensive function. A proper fix for this might change cost estimates by more than people would like for stable branches, so in the back branches let's just clamp the cost difference to be not less than zero. That will at least prevent completely insane behavior, while not changing the results normally.	2012-04-11 20:24:25 -04:00
Tom Lane	35400e14b8	Ignore missing schemas during non-interactive assignment of search_path. This aligns 9.1's behavior with that of older branches. HEAD is now even laxer, ignoring missing schemas all the time, but that seems like too big a change for a released branch. Per complaint from Robert Haas.	2012-04-11 12:03:05 -04:00
Tom Lane	0e20abdb47	Fix an Assert that turns out to be reachable after all. estimate_num_groups() gets unhappy with create table empty(); select * from empty except select * from empty e2; I can't see any actual use-case for such a query (and the table is illegal per SQL spec), but it seems like a good idea that it not cause an assert failure.	2012-04-09 11:58:40 -04:00
Heikki Linnakangas	88e071533f	set_stack_base() no longer needs to be called in PostgresMain. This was a thinko in previous commit. Now that stack base pointer is now set in PostmasterMain and SubPostmasterMain, it doesn't need to be set in PostgresMain anymore.	2012-04-08 19:40:54 +03:00
Heikki Linnakangas	ef29bb1f72	Do stack-depth checking in all postmaster children. We used to only initialize the stack base pointer when starting up a regular backend, not in other processes. In particular, autovacuum workers can run arbitrary user code, and without stack-depth checking, infinite recursion in e.g an index expression will bring down the whole cluster. The comment about PL/Java using set_stack_base() is not yet true. As the code stands, PL/java still modifies the stack_base_ptr variable directly. However, it's been discussed in the PL/Java mailing list that it should be changed to use the function, because PL/Java is currently oblivious to the register stack used on Itanium. There's another issues with PL/Java, namely that the stack base pointer it sets is not really the base of the stack, it could be something close to the bottom of the stack. That's a separate issue that might need some further changes to this code, but that's a different story. Backpatch to all supported releases.	2012-04-08 19:08:13 +03:00
Tom Lane	e2923d3419	Fix misleading output from gin_desc(). XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed with a list link field labeled as "blkno", which was confusing, especially when the link was empty (InvalidBlockNumber). Print the metapage block number instead, since that's what's actually being updated. We could include the link values too as a separate field, but not clear it's worth the trouble. Back-patch to 8.4 where the dubious code was added.	2012-04-06 18:10:26 -04:00
Tom Lane	60243e89a7	Fix syslogger to not lose log coherency under high load. The original coding of the syslogger had an arbitrary limit of 20 large messages concurrently in progress, after which it would just punt and dump message fragments to the output file separately. Our ambitions are a bit higher than that now, so allow the data structure to expand as necessary. Reported and patched by Andrew Dunstan; some editing by Tom	2012-04-04 15:05:16 -04:00
Simon Riggs	20d98ab6e4	Correct epoch of txid_current() when executed on a Hot Standby server. Initialise ckptXidEpoch from starting checkpoint and maintain the correct value as we roll forwards. This allows GetNextXidAndEpoch() to return the correct epoch when executed during recovery. Backpatch to 9.0 when the problem is first observable by a user. Bug report from Daniel Farina	2012-03-29 14:57:08 +01:00
Tom Lane	9f0a017076	Fix COPY FROM for null marker strings that correspond to invalid encoding. The COPY documentation says "COPY FROM matches the input against the null string before removing backslashes". It is therefore reasonable to presume that null markers like E'\\0' will work ... and they did, until someone put the tests in the wrong order during microoptimization-driven rewrites. Since then, we've been failing if the null marker is something that would de-escape to an invalidly-encoded string. Since null markers generally need to be something that can't appear in the data, this represents a nontrivial loss of functionality; surprising nobody noticed it earlier. Per report from Jeff Davis. Backpatch to 8.4 where this got broken.	2012-03-25 23:17:27 -04:00
Tom Lane	811a2cbc16	Fix planner's handling of outer PlaceHolderVars within subqueries. For some reason, in the original coding of the PlaceHolderVar mechanism I had supposed that PlaceHolderVars couldn't propagate into subqueries. That is of course entirely possible. When it happens, we need to treat an outer-level PlaceHolderVar much like an outer Var or Aggref, that is SS_replace_correlation_vars() needs to replace the PlaceHolderVar with a Param, and then when building the finished SubPlan we have to provide the PlaceHolderVar expression as an actual parameter for the SubPlan. The handling of the contained expression is a bit delicate but it can be treated exactly like an Aggref's expression. In addition to the missing logic in subselect.c, prepjointree.c was failing to search subqueries for PlaceHolderVars that need their relids adjusted during subquery pullup. It looks like everyplace else that touches PlaceHolderVars got it right, though. Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this oversight would fail with "ERROR: Upper-level PlaceHolderVar found where not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong answers, since the value transmitted into the subquery wouldn't go to null when it should.	2012-03-24 16:21:48 -04:00
Tom Lane	eca0c389f1	Cast some printf arguments to avoid possibly-nonportable behavior. Per compiler warnings on buildfarm member black_firefly.	2012-03-23 20:18:08 -04:00
Robert Haas	16f42be840	Don't allow CREATE TABLE AS to put relations in pg_global. This was never intended to be allowed, and is blocked for an ordinary CREATE TABLE, but CREATE TABLE AS slipped through the cracks. This commit won't do anything to fix existing cases where this has loophole has been exploited, but it still seems prudent to lock it down going forward. Back-branch commit only, as this problem has been refactored away on the master branch. Andres Freund	2012-03-21 12:39:48 -04:00
Heikki Linnakangas	04e9dc6e01	Fix bug where walsender goes into a busy loop if connection is terminated. The problem was that ResetLatch was not being called in the walsender loop if the connection was terminated, so WaitLatch never sleeps until the terminated connection is detected. In the master-branch, this was already fixed as a side-effect of some refactoring of the loop. This commit backports that refactoring to 9.1. 9.0 does not have this bug, because we didn't use latches back then. Fujii Masao	2012-03-21 17:43:53 +02:00
Tom Lane	805f798e0e	Revisit handling of UNION ALL subqueries with non-Var output columns. In commit `57664ed25e` I tried to fix a bug reported by Teodor Sigaev by making non-simple-Var output columns distinct (by wrapping their expressions with dummy PlaceHolderVar nodes). This did not work too well. Commit `b28ffd0fcc` fixed some ensuing problems with matching to child indexes, but per a recent report from Claus Stadler, constraint exclusion of UNION ALL subqueries was still broken, because constant-simplification didn't handle the injected PlaceHolderVars well either. On reflection, the original patch was quite misguided: there is no reason to expect that EquivalenceClass child members will be distinct. So instead of trying to make them so, we should ensure that we can cope with the situation when they're not. Accordingly, this patch reverts the code changes in the above-mentioned commits (though the regression test cases they added stay). Instead, I've added assorted defenses to make sure that duplicate EC child members don't cause any problems. Teodor's original problem ("MergeAppend child's targetlist doesn't match MergeAppend") is addressed more directly by revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort list guide creation of each child's sort list. In passing, get rid of add_sort_column; as far as I can tell, testing for duplicate sort keys at this stage is dead code. Certainly it doesn't trigger often enough to be worth expending cycles on in ordinary queries. And keeping the test would've greatly complicated the new logic in prepare_sort_from_pathkeys, because comparing pathkey list entries against a previous output array requires that we not skip any entries in the list. Back-patch to 9.1, like the previous patches. The only known issue in this area that wasn't caused by the ill-advised previous patches was the MergeAppend planning failure, which of course is not relevant before 9.1. It's possible that we need some of the new defenses against duplicate child EC entries in older branches, but until there's some clear evidence of that I'm going to refrain from back-patching further.	2012-03-16 13:11:20 -04:00
Tom Lane	ad05b5d28d	Fix some issues with temp/transient tables in extension scripts. Phil Sorber reported that a rewriting ALTER TABLE within an extension update script failed, because it creates and then drops a placeholder table; the drop was being disallowed because the table was marked as an extension member. We could hack that specific case but it seems likely that there might be related cases now or in the future, so the most practical solution seems to be to create an exception to the general rule that extension member objects can only be dropped by dropping the owning extension. To wit: if the DROP is issued within the extension's own creation or update scripts, we'll allow it, implicitly performing an "ALTER EXTENSION DROP object" first. This will simplify cases such as extension downgrade scripts anyway. No docs change since we don't seem to have documented the idea that you would need ALTER EXTENSION DROP for such an action to begin with. Also, arrange for explicitly temporary tables to not get linked as extension members in the first place, and the same for the magic pg_temp_nnn schemas that are created to hold them. This prevents assorted unpleasant results if an extension script creates a temp table: the forced drop at session end would either fail or remove the entire extension, and neither of those outcomes is desirable. Note that this doesn't fix the ALTER TABLE scenario, since the placeholder table is not temp (unless the table being rewritten is). Back-patch to 9.1.	2012-03-08 15:52:34 -05:00
Tom Lane	ef03b34550	Allow child-relation entries to be made in ec_has_const EquivalenceClasses. This fixes an oversight in commit `11cad29c91`, which introduced MergeAppend plans. Before that happened, we never particularly cared about the sort ordering of scans of inheritance child relations, since appending their outputs together would destroy any ordering anyway. But now it's important to be able to match child relation sort orderings to those of the surrounding query. The original coding of add_child_rel_equivalences skipped ec_has_const EquivalenceClasses, on the originally-correct grounds that adding child expressions to them was useless. The effect of this is that when a parent variable is equated to a constant, we can't recognize that index columns on the equivalent child variables are not sort-significant; that is, we can't recognize that a child index on, say, (x, y) is able to generate output in "ORDER BY y" order when there is a clause "WHERE x = constant". Adding child expressions to the (x, constant) EquivalenceClass fixes this, without any downside that I can see other than a few more planner cycles expended on such queries. Per recent gripe from Robert McGehee. Back-patch to 9.1 where MergeAppend was introduced.	2012-03-02 14:28:49 -05:00
Heikki Linnakangas	86073a2b7a	Correctly detect SSI conflicts of prepared transactions after crash. A prepared transaction can get new conflicts in and out after preparing, so we cannot rely on the in- and out-flags stored in the statefile at prepare- time. As a quick fix, make the conservative assumption that after a restart, all prepared transactions are considered to have both in- and out-conflicts. That can lead to unnecessary rollbacks after a crash, but that shouldn't be a big problem in practice; you don't want prepared transactions to hang around for a long time anyway. Dan Ports	2012-02-29 15:43:43 +02:00
Tom Lane	57b100fe0f	Fix some more bugs in GIN's WAL replay logic. In commit `4016bdef8a` I fixed a bunch of ginxlog.c bugs having to do with not handling XLogReadBuffer failures correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages, I unaccountably thought that failure to read the metapage would be impossible and just put in an elog(PANIC) call. This is of course wrong: failure is exactly what will happen if the index got dropped (or rebuilt) between creation of the WAL record and the crash we're trying to recover from. I believe this explains Nicholas Wilson's recent report of these errors getting reached. Also, fix memory leak in forgetIncompleteSplit. This wasn't of much concern when the code was written, but in a long-running standby server page split records could be expected to accumulate indefinitely. Back-patch to 8.4 --- before that, GIN didn't have a metapage.	2012-02-26 15:12:23 -05:00
Tom Lane	e6fcb03dc0	Remove arbitrary limitation on length of common name in SSL certificates. Both libpq and the backend would truncate a common name extracted from a certificate at 32 bytes. Replace that fixed-size buffer with dynamically allocated string so that there is no hard limit. While at it, remove the code for extracting peer_dn, which we weren't using for anything; and don't bother to store peer_cn longer than we need it in libpq. This limit was not so terribly unreasonable when the code was written, because we weren't using the result for anything critical, just logging it. But now that there are options for checking the common name against the server host name (in libpq) or using it as the user's name (in the server), this could result in undesirable failures. In the worst case it even seems possible to spoof a server name or user name, if the correct name is exactly 32 bytes and the attacker can persuade a trusted CA to issue a certificate in which that string is a prefix of the certificate's common name. (To exploit this for a server name, he'd also have to send the connection astray via phony DNS data or some such.) The case that this is a realistic security threat is a bit thin, but nonetheless we'll treat it as one. Back-patch to 8.4. Older releases contain the faulty code, but it's not a security problem because the common name wasn't used for anything interesting. Reported and patched by Heikki Linnakangas Security: CVE-2012-0867	2012-02-23 15:48:09 -05:00
Tom Lane	54e2b6488b	Require execute permission on the trigger function for CREATE TRIGGER. This check was overlooked when we added function execute permissions to the system years ago. For an ordinary trigger function it's not a big deal, since trigger functions execute with the permissions of the table owner, so they couldn't do anything the user issuing the CREATE TRIGGER couldn't have done anyway. However, if a trigger function is SECURITY DEFINER, that is not the case. The lack of checking would allow another user to install it on his own table and then invoke it with, essentially, forged input data; which the trigger function is unlikely to realize, so it might do something undesirable, for instance insert false entries in an audit log table. Reported by Dinesh Kumar, patch by Robert Haas Security: CVE-2012-0866	2012-02-23 15:39:02 -05:00
Peter Eisentraut	602dd1eeaa	Translation updates	2012-02-23 20:40:55 +02:00
Peter Eisentraut	09c00af94e	Remove inappropriate quotes And adjust wording for consistency.	2012-02-23 12:52:03 +02:00
Alvaro Herrera	cfd1c382f0	REASSIGN OWNED: Support foreign data wrappers and servers This was overlooked when implementing those kinds of objects, in commit `cae565e503`. Per report from Pawel Casperek.	2012-02-22 17:32:42 -03:00
Simon Riggs	11c730f412	Correctly initialise shared recoveryLastRecPtr in recovery. Previously we used ReadRecPtr rather than EndRecPtr, which was not a serious error but caused pg_stat_replication to report incorrect replay_location until at least one WAL record is replayed. Fujii Masao	2012-02-22 13:53:48 +00:00
Tom Lane	6182e01f18	Don't clear btpo_cycleid during _bt_vacuum_one_page. When "vacuuming" a single btree page by removing LP_DEAD tuples, we are not actually within a vacuum operation, but rather in an ordinary insertion process that could well be running concurrently with a vacuum. So clearing the cycleid is incorrect, and could cause the concurrent vacuum to miss removing tuples that it needs to remove. This is a longstanding bug introduced by commit `e6284649b9` of 2006-07-25. I believe it explains Maxim Boguk's recent report of index corruption, and probably some other previously unexplained reports. In 9.0 and up this is a one-line fix; before that we need to introduce a flag to tell _bt_delitems what to do.	2012-02-21 15:03:44 -05:00
Magnus Hagander	3d2aa2c086	Avoid double close of file handle in syslogger on win32 This causes an exception when running under a debugger or in particular when running on a debug version of Windows. Patch from MauMau	2012-02-21 17:13:59 +01:00
Tom Lane	d3158f339f	Fix regex back-references that are directly quantified with . The syntax "\n", that is a backref with a * quantifier directly applied to it, has never worked correctly in Spencer's library. This has been an open bug in the Tcl bug tracker since 2005: https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894 The core of the problem is in parseqatom(), which first changes "\n" to "\n+\|" and then applies repeat() to the NFA representing the backref atom. repeat() thinks that any arc leading into its "rp" argument is part of the sub-NFA to be repeated. Unfortunately, since parseqatom() already created the arc that was intended to represent the empty bypass around "\n+", this arc gets moved too, so that it now leads into the state loop created by repeat(). Thus, what was supposed to be an "empty" bypass gets turned into something that represents zero or more repetitions of the NFA representing the backref atom. In the original example, in place of ^([bc])\1$ we now have something that acts like ^([bc])(\1+\|[bc])$ At runtime, the branch involving the actual backref fails, as it's supposed to, but then the other branch succeeds anyway. We could no doubt fix this by some rearrangement of the operations in parseqatom(), but that code is plenty ugly already, and what's more the whole business of converting "x" to "x+\|" probably needs to go away to fix another problem I'll mention in a moment. Instead, this patch suppresses the *-conversion when the target is a simple backref atom, leaving the case of m == 0 to be handled at runtime. This makes the patch in regcomp.c a one-liner, at the cost of having to tweak cbrdissect() a little. In the event I went a bit further than that and rewrote cbrdissect() to check all the string-length-related conditions before it starts comparing characters. It seems a bit stupid to possibly iterate through many copies of an n-character backreference, only to fail at the end because the target string's length isn't a multiple of n --- we could have found that out before starting. The existing coding could only be a win if integer division is hugely expensive compared to character comparison, but I don't know of any modern machine where that might be true. This does not fix all the problems with quantified back-references. In particular, the code is still broken for back-references that appear within a larger expression that is quantified (so that direct insertion of the quantification limits into the BACKREF node doesn't apply). I think fixing that will take some major surgery on the NFA code, specifically introducing an explicit iteration node type instead of trying to transform iteration into concatenation of modified regexps. Back-patch to all supported branches. In HEAD, also add a regression test case for this. (It may seem a bit silly to create a regression test file for just one test case; but I'm expecting that we will soon import a whole bunch of regex regression tests from Tcl, so might as well create the infrastructure now.)	2012-02-20 00:52:42 -05:00
Tom Lane	04a0231e56	Run a portal's cleanup hook immediately when pushing it to FAILED state. This extends the changes of commit `6252c4f9e2` so that we run the cleanup hook earlier for failure cases as well as success cases. As before, the point is to avoid an assertion failure from an Assert I added in commit `a874fe7b4c`, which was meant to check that no user-written code can be called during portal cleanup. This fixes a case reported by Pavan Deolasee in which the Assert could be triggered during backend exit (see the new regression test case), and also prevents the possibility that the cleanup hook is run after portions of the portal's state have already been recycled. That doesn't really matter in current usage, but it foreseeably could matter in the future. Back-patch to 9.1 where the Assert in question was added.	2012-02-15 16:18:39 -05:00
Tom Lane	1e7d008bf8	Throw error sooner for unlogged GiST indexes. Throwing an error only after we've built the main index fork is pretty unfriendly when the table already contains data. Per gripe from Jay Levitt.	2012-02-08 16:19:31 -05:00
Tom Lane	ef19c9dfaa	Fix postmaster to attempt restart after a hot-standby crash. The postmaster was coded to treat any unexpected exit of the startup process (i.e., the WAL replay process) as a catastrophic crash, and not try to restart it. This was OK so long as the startup process could not have any sibling postmaster children. However, if a hot-standby backend crashes, we SIGQUIT the startup process along with everything else, and the resulting exit is hardly "unexpected". Treating it as such meant we failed to restart a standby server after any child crash at all, not only a crash of the WAL replay process as intended. Adjust that. Back-patch to 9.0 where hot standby was introduced.	2012-02-06 15:29:41 -05:00
Tom Lane	c74ad4e55b	Avoid throwing ERROR during WAL replay of DROP TABLESPACE. Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless removal of the tablespace's directories succeeds, that does not guarantee that the same operation will succeed during WAL replay. Foreseeable reasons for it to fail include temp files created in the tablespace by Hot Standby backends, wrong directory permissions on a standby server, etc etc. The original coding threw ERROR if replay failed to remove the directories, but that is a serious overreaction. Throwing an error aborts recovery, and worse means that manual intervention will be needed to get the database to start again, since otherwise the same error will recur on subsequent attempts to replay the same WAL record. And the consequence of failing to remove the directories is only that some probably-small amount of disk space is wasted, so it hardly seems justified to throw an error. Accordingly, arrange to report such failures as LOG messages and keep going when a failure occurs during replay. Back-patch to 9.0 where Hot Standby was introduced. In principle such problems can occur in earlier releases, but Hot Standby increases the odds of trouble significantly. Given the lack of field reports of such issues, I'm satisfied with patching back as far as the patch applies easily.	2012-02-06 14:44:04 -05:00
Tom Lane	f1b8a84dec	Avoid problems with OID wraparound during WAL replay. Fix a longstanding thinko in replay of NEXTOID and checkpoint records: we tried to advance nextOid only if it was behind the value in the WAL record, but the comparison would draw the wrong conclusion if OID wraparound had occurred since the previous value. Better to just unconditionally assign the new value, since OID assignment shouldn't be happening during replay anyway. The consequences of a failure to update nextOid would be pretty minimal, since we have long had the code set up to obtain another OID and try again if the generated value is already in use. But in the worst case there could be significant performance glitches while such loops iterate through many already-used OIDs before finding a free one. The odds of a wraparound happening during WAL replay would be small in a crash-recovery scenario, and the length of any ensuing OID-assignment stall quite limited anyway. But neither of these statements hold true for a replication slave that follows a WAL stream for a long period; its behavior upon going live could be almost unboundedly bad. Hence it seems worth back-patching this fix into all supported branches. Already fixed in HEAD in commit `c6d76d7c82`.	2012-02-06 13:14:46 -05:00
Tom Lane	b2e1eaa4a1	Fix transient clobbering of shared buffers during WAL replay. RestoreBkpBlocks was in the habit of zeroing and refilling the target buffer; which was perfectly safe when the code was written, but is unsafe during Hot Standby operation. The reason is that we have coding rules that allow backends to continue accessing a tuple in a heap relation while holding only a pin on its buffer. Such a backend could see transiently zeroed data, if WAL replay had occasion to change other data on the page. This has been shown to be the cause of bug #6425 from Duncan Rance (who deserves kudos for developing a sufficiently-reproducible test case) as well as Bridget Frey's re-report of bug #6200. It most likely explains the original report as well, though we don't yet have confirmation of that. To fix, change the code so that only bytes that are supposed to change will change, even transiently. This actually saves cycles in RestoreBkpBlocks, since it's not writing the same bytes twice. Also fix seq_redo, which has the same disease, though it has to work a bit harder to meet the requirement. So far as I can tell, no other WAL replay routines have this type of bug. In particular, the index-related replay routines, which would certainly be broken if they had to meet the same standard, are not at risk because we do not have coding rules that allow access to an index page when not holding a buffer lock on it. Back-patch to 9.0 where Hot Standby was added.	2012-02-05 15:49:35 -05:00
Simon Riggs	8572cc495c	Resolve timing issue with logging locks for Hot Standby. We log AccessExclusiveLocks for replay onto standby nodes, but because of timing issues on ProcArray it is possible to log a lock that is still held by a just committed transaction that is very soon to be removed. To avoid any timing issue we avoid applying locks made by transactions with InvalidXid. Simon Riggs, bug report Tom Lane, diagnosis Pavan Deolasee	2012-02-01 09:31:07 +00:00
Heikki Linnakangas	b896e1e852	Accept a non-existent value in "ALTER USER/DATABASE SET ..." command. When default_text_search_config, default_tablespace, or temp_tablespaces setting is set per-user or per-database, with an "ALTER USER/DATABASE SET ..." statement, don't throw an error if the text search configuration or tablespace does not exist. In case of text search configuration, even if it doesn't exist in the current database, it might exist in another database, where the setting is intended to have its effect. This behavior is now the same as search_path's. Tablespaces are cluster-wide, so the same argument doesn't hold for tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER SET ..." statements before the "CREATE TABLESPACE" statements. Arguably that's pg_dumpall's fault - it should dump the statements in such an order that the tablespace is created first and then the "ALTER USER SET default_tablespace ..." statements after that - but it seems better to be consistent with search_path and default_text_search_config anyway. Besides, you could still create a dump that throws an error, by creating the tablespace, running "ALTER USER SET default_tablespace", then dropping the tablespace and running pg_dumpall on that. Backpatch to all supported versions.	2012-01-30 11:19:38 +02:00
Tom Lane	106123fa26	Fix pushing of index-expression qualifications through UNION ALL. In commit `57664ed25e`, I made the planner wrap non-simple-variable outputs of appendrel children (IOW, child SELECTs of UNION ALL subqueries) inside PlaceHolderVars, in order to solve some issues with EquivalenceClass processing. However, this means that any upper-level WHERE clauses mentioning such outputs will now contain PlaceHolderVars after they're pushed down into the appendrel child, and that prevents indxpath.c from recognizing that they could be matched to index expressions. To fix, add explicit stripping of PlaceHolderVars from index operands, same as we have long done for RelabelType nodes. Add a regression test covering both this and the plain-UNION case (which is a totally different code path, but should also be able to do it). Per bug #6416 from Matteo Beccati. Back-patch to 9.1, same as the previous change.	2012-01-29 16:31:31 -05:00
Tom Lane	1a096957d7	Fix handling of init_plans list in inheritance_planner(). Formerly we passed an empty list to each per-child-table invocation of grouping_planner, and then merged the results into the global list. However, that fails if there's a CTE attached to the statement, because create_ctescan_plan uses the list to find the plan referenced by a CTE reference; so it was unable to find any CTEs attached to the outer UPDATE or DELETE. But there's no real reason not to use the same list throughout the process, and doing so is simpler and faster anyway. Per report from Josh Berkus of "could not find plan for CTE" failures. Back-patch to 9.1 where we added support for WITH attached to UPDATE or DELETE. Add some regression test cases, too.	2012-01-28 20:24:49 -05:00
Tom Lane	0b1e177953	Fix handling of data-modifying CTE subplans in EvalPlanQual. We can't just skip initializing such subplans, because the referencing CTE node will expect to find the subplan available when it initializes. That in turn means that ExecInitModifyTable must allow the case (which actually it needed to do anyway, since there's no guarantee that ModifyTable is exactly at the top of the CTE plan tree). So move the complaint about not being allowed in EvalPlanQual mode to execution instead of initialization. Testing turned up yet another problem, which is that we'd try to re-initialize the result relation's index list, leading to leaks and dangling pointers. Per report from Phil Sorber. Back-patch to 9.1 where data-modifying CTEs were introduced.	2012-01-28 17:44:03 -05:00
Heikki Linnakangas	02f377dbe5	Fix corner case in cleanup of transactions using SSI. When the only remaining active transactions are READ ONLY, we do a "partial cleanup" of committed transactions because certain types of conflicts aren't possible anymore. For committed r/w transactions, we release the SIREAD locks but keep the SERIALIZABLEXACT. However, for committed r/o transactions, we can go further and release the SERIALIZABLEXACT too. The problem was with the latter case: we were returning the SERIALIZABLEXACT to the free list without removing it from the finished list. The only real change in the patch is the SHMQueueDelete line, but I also reworked some of the surrounding code to make it obvious that r/o and r/w transactions are handled differently -- the existing code felt a bit too clever. Dan Ports	2012-01-18 17:12:58 +02:00
Tom Lane	b994c57a80	Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows. In commit `7b0d0e9356`, I made CLUSTER and VACUUM FULL try to preserve toast value OIDs from the original toast table to the new one. However, if we have to copy both live and recently-dead versions of a row that has a toasted column, those versions may well reference the same toast value with the same OID. The patch then led to duplicate-key failures as we tried to insert the toast value twice with the same OID. (The previous behavior was not very desirable either, since it would have silently inserted the same value twice with different OIDs. That wastes space, but what's worse is that the toast values inserted for already-dead heap rows would not be reclaimed by subsequent ordinary VACUUMs, since they go into the new toast table marked live not deleted.) To fix, check if the copied OID already exists in the new toast table, and if so, assume that it stores the desired value. This is reasonably safe since the only case where we will copy an OID from a previous toast pointer is when toast_insert_or_update was given that toast pointer and so we just pulled the data from the old table; if we got two different values that way then we have big problems anyway. We do have to assume that no other backend is inserting items into the new toast table concurrently, but that's surely safe for CLUSTER and VACUUM FULL. Per bug #6393 from Maxim Boguk. Back-patch to 9.0, same as the previous patch.	2012-01-12 16:40:19 -05:00
Robert Haas	f9f0484504	Fix variable confusion in BufferSync(). As noted by Heikki Linnakangas, the previous coding confused the "flags" variable with the "mask" variable. The affect of this appears to be that unlogged buffers would get written out at every checkpoint rather than only at shutdown time. Although that's arguably an acceptable failure mode, I'm back-patching this change, since it seems like a poor idea to rely on this happening to work.	2012-01-06 08:35:41 -05:00
Tom Lane	658ee01086	Make executor's SELECT INTO code save and restore original tuple receiver. As previously coded, the QueryDesc's dest pointer was left dangling (pointing at an already-freed receiver object) after ExecutorEnd. It's a bit astonishing that it took us this long to notice, and I'm not sure that the known problem case with SQL functions is the only one. Fix it by saving and restoring the original receiver pointer, which seems the most bulletproof way of ensuring any related bugs are also covered. Per bug #6379 from Paul Ramsey. Back-patch to 8.4 where the current handling of SELECT INTO was introduced.	2012-01-04 18:31:01 -05:00
Tom Lane	188f1b9282	Fix coerce_to_target_type for coerce_type's klugy handling of COLLATE. Because coerce_type recurses into the argument of a CollateExpr, coerce_to_target_type's longstanding code for detecting whether coerce_type had actually done anything (to wit, returned a different node than it passed in) was broken in 9.1. This resulted in unexpected failures in hide_coercion_node; which was not the latter's fault, since it's critical that we never call it on anything that wasn't inserted by coerce_type. (Else we might decide to "hide" a user-written function call.) Fix by removing and replacing the CollateExpr in coerce_to_target_type itself. This is all pretty ugly but I don't immediately see a way to make it nicer. Per report from Jean-Yves F. Barbier.	2012-01-02 14:43:51 -05:00
Tom Lane	71b23708d4	Update per-column ACLs, not only per-table ACL, when changing table owner. We forgot to modify column ACLs, so privileges were still shown as having been granted by the old owner. This meant that neither the new owner nor a superuser could revoke the now-untraceable-to-table-owner permissions. Per bug #6350 from Marc Balmer. This has been wrong since column ACLs were added, so back-patch to 8.4.	2011-12-21 18:23:18 -05:00
Tom Lane	5d7d12de56	Fix gincostestimate to handle ScalarArrayOpExpr reasonably. The original coding of this function overlooked the possibility that it could be passed anything except simple OpExpr indexquals. But ScalarArrayOpExpr is possible too, and the code would probably crash (and surely give ridiculous answers) in such a case. Add logic to try to estimate sanely for such cases. In passing, fix the treatment of inner-indexscan cost estimation: it was failing to scale up properly for multiple iterations of a nestloop. (I think somebody might've thought that index_pages_fetched() is linear, but of course it's not.) Report, diagnosis, and preliminary patch by Marti Raudsepp; I refactored it a bit and fixed the cost estimation. Back-patch into 9.1 where the bogus code was introduced.	2011-12-20 19:57:40 -05:00
Tom Lane	a63a7a5b09	Avoid crashing when we have problems unlinking files post-commit. smgrdounlink takes care to not throw an ERROR if it fails to unlink something, but that caution was rendered useless by commit `3396000684`, which put an smgrexists call in front of it; smgrexists does throw error if anything looks funny, such as getting a permissions error from trying to open the file. If that happens post-commit, you get a PANIC, and what's worse the same logic appears in the WAL replay code, so the database even fails to restart. Restore the intended behavior by removing the smgrexists call --- it isn't accomplishing anything that we can't do better by adjusting mdunlink's ideas of whether it ought to warn about ENOENT or not. Per report from Joseph Shraibman of unrecoverable crash after trying to drop a table whose FSM fork had somehow gotten chmod'd to 000 permissions. Backpatch to 8.4, where the bogus coding was introduced.	2011-12-20 15:00:41 -05:00
Heikki Linnakangas	6cf639dfbd	Revert the behavior of inet/cidr functions to not unpack the arguments. I forgot to change the functions to use the PG_GETARG_INET_PP() macro, when I changed DatumGetInetP() to unpack the datum, like Datum*P macros usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP() macro, and didn't notice because it wasn't used. This fixes the memory leak when sorting inet values, as reported by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like the previous patch that broke it.	2011-12-12 10:05:15 +02:00

1 2 3 4 5 ...

12137 commits