postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-01 21:01:12 -05:00

Author	SHA1	Message	Date
Andrew Dunstan	7f89fc4182	Add a utility function to extract variadic function arguments This is epecially useful in the case or "VARIADIC ANY" functions. The caller can get the artguments and types regardless of whether or not and explicit VARIADIC array argument has been used. The function also provides an option to convert arguments on type "unknown" to to "text". Michael Paquier and me, reviewed by Tom Lane. Backpatch to 9.4 in order to support the following json bug fix.	2017-10-25 07:20:48 -04:00
Tom Lane	0e84f70752	Fix low-probability loss of NOTIFY messages due to XID wraparound. Up to now async.c has used TransactionIdIsInProgress() to detect whether a notify message's source transaction is still running. However, that function has a quick-exit path that reports that XIDs before RecentXmin are no longer running. If a listening backend is doing nothing but listening, and not running any queries, there is nothing that will advance its value of RecentXmin. Once 2 billion transactions elapse, the RecentXmin check causes active transactions to be reported as not running. If they aren't committed yet according to CLOG, async.c decides they aborted and discards their messages. The timing for that is a bit tight but it can happen when multiple backends are sending notifies concurrently. The net symptom therefore is that a sufficiently-long-surviving listen-only backend starts to miss some fraction of NOTIFY traffic, but only under heavy load. The only function that updates RecentXmin is GetSnapshotData(). A brute-force fix would therefore be to take a snapshot before processing incoming notify messages. But that would add cycles, as well as contention for the ProcArrayLock. We can be smarter: having taken the snapshot, let's use that to check for running XIDs, and not call TransactionIdIsInProgress() at all. In this way we reduce the number of ProcArrayLock acquisitions from one per message to one per notify interrupt; that's the same under light load but should be a benefit under heavy load. Light testing says that this change is a wash performance-wise for normal loads. I looked around for other callers of TransactionIdIsInProgress() that might be at similar risk, and didn't find any; all of them are inside transactions that presumably have already taken a snapshot. Problem report and diagnosis by Marko Tiikkaja, patch by me. Back-patch to all supported branches, since it's been like this since 9.0. Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org	2017-10-11 14:28:33 -04:00
Tom Lane	bfb69b1e5d	Fix crash when logical decoding is invoked from a PL function. The logical decoding functions do BeginInternalSubTransaction and RollbackAndReleaseCurrentSubTransaction to clean up after themselves. It turns out that AtEOSubXact_SPI has an unrecognized assumption that we always need to cancel the active SPI operation in the SPI context that surrounds the subtransaction (if there is one). That's true when the RollbackAndReleaseCurrentSubTransaction call is coming from the SPI-using function itself, but not when it's happening inside some unrelated function invoked by a SPI query. In practice the affected callers are the various PLs. To fix, record the current subtransaction ID when we begin a SPI operation, and clean up only if that ID is the subtransaction being canceled. Also, remove AtEOSubXact_SPI's assertion that it must have cleaned up the surrounding SPI context's active tuptable. That's proven wrong by the same test case. Also clarify (or, if you prefer, reinterpret) the calling conventions for _SPI_begin_call and _SPI_end_call. The memory context cleanup in the latter means that these have always had the flavor of a matched resource-management pair, but they weren't documented that way before. Per report from Ben Chobot. Back-patch to 9.4 where logical decoding came in. In principle, the SPI changes should go all the way back, since the problem dates back to commit `7ec1c5a86`. But given the lack of field complaints it seems few people are using internal subtransactions in this way. So I don't feel a need to take any risks in 9.2/9.3. Discussion: https://postgr.es/m/73FBA179-C68C-4540-9473-71E865408B15@silentmedia.com	2017-10-06 19:18:59 -04:00
Alvaro Herrera	8b6d85f2dc	Fix traversal of half-frozen update chains When some tuple versions in an update chain are frozen due to them being older than freeze_min_age, the xmax/xmin trail can become broken. This breaks HOT (and probably other things). A subsequent VACUUM can break things in more serious ways, such as leaving orphan heap-only tuples whose root HOT redirect items were removed. This can be seen because index creation (or REINDEX) complain like ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t" Because of relfrozenxid contraints, we cannot avoid the freezing of the early tuples, so we must cope with the results: whenever we see an Xmin of FrozenTransactionId, consider it a match for whatever the previous Xmax value was. This problem seems to have appeared in 9.3 with multixact changes, though strictly speaking it seems unrelated. Since 9.4 we have commit `37484ad2a` "Change the way we mark tuples as frozen", so the fix is simple: just compare the raw Xmin (still stored in the tuple header, since freezing merely set an infomask bit) to the Xmax. But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so the original value is lost and we have nothing to compare the Xmax with. To cope with that case we need to compare the Xmin with FrozenXid, assume it's a match, and hope for the best. Sadly, since you can pg_upgrade a 9.3 instance containing half-frozen pages to newer releases, we need to keep the old check in newer versions too, which seems a bit brittle; I hope we can somehow get rid of that. I didn't optimize the new function for performance. The new coding is probably a bit slower than before, since there is a function call rather than a straight comparison, but I'd rather have it work correctly than be fast but wrong. This is a followup after `20b6552242` fixed a few related problems. Apparently, in 9.6 and up there are more ways to get into trouble, but in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so there must be a separate bug. Reported-by: Peter Geoghegan Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood, Yi Wen Wong, Álvaro Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com	2017-10-06 17:14:42 +02:00
Tom Lane	3a07a6f3e8	Avoid SIGBUS on Linux when a DSM memory request overruns tmpfs. On Linux, shared memory segments created with shm_open() are backed by swap files created in tmpfs. If the swap file needs to be extended, but there's no tmpfs space left, you get a very unfriendly SIGBUS trap. To avoid this, force allocation of the full request size when we create the segment. This adds a few cycles, but none that we wouldn't expend later anyway, assuming the request isn't hugely bigger than the actual need. Make this code #ifdef __linux__, because (a) there's not currently a reason to think the same problem exists on other platforms, and (b) applying posix_fallocate() to an FD created by shm_open() isn't very portable anyway. Back-patch to 9.4 where the DSM code came in. Thomas Munro, per a bug report from Amul Sul Discussion: https://postgr.es/m/1002664500.12301802.1471008223422.JavaMail.yahoo@mail.yahoo.com	2017-09-25 16:09:20 -04:00
Tom Lane	53863ebeba	Make [U]INT64CONST safe for use in #if conditions. Instead of using a cast to force the constant to be the right width, assume we can plaster on an L, UL, LL, or ULL suffix as appropriate. The old approach to this is very hoary, dating from before we were willing to require compilers to have working int64 types. This fix makes the PG_INT64_MIN, PG_INT64_MAX, and PG_UINT64_MAX constants safe to use in preprocessor conditions, where a cast doesn't work. Other symbolic constants that might be defined using [U]INT64CONST are likewise safer than before. Also fix the SIZE_MAX macro to be similarly safe, if we are forced to provide a definition for that. The test added in commit `2e70d6b5e` happens to do what we want even with the hack "(size_t) -1" definition, but we could easily get burnt on other tests in future. Back-patch to all supported branches, like the previous commits. Discussion: https://postgr.es/m/15883.1504278595@sss.pgh.pa.us	2017-09-01 15:14:18 -04:00
Tom Lane	9a98557662	Ensure SIZE_MAX can be used throughout our code. Pre-C99 platforms may lack <stdint.h> and thereby SIZE_MAX. We have a couple of places using the hack "(size_t) -1" as a fallback, but it wasn't universally available; which means the code added in commit `2e70d6b5e` fails to compile everywhere. Move that hack to c.h so that we can rely on having SIZE_MAX everywhere. Per discussion, it'd be a good idea to make the macro's value safe for use in #if-tests, but that will take a bit more work. This is just a quick expedient to get the buildfarm green again. Back-patch to all supported branches, like the previous commit. Discussion: https://postgr.es/m/15883.1504278595@sss.pgh.pa.us	2017-09-01 13:52:54 -04:00
Tom Lane	ca3f8299ef	Stamp 9.4.14.	2017-08-28 17:26:11 -04:00
Andres Freund	3d58994ecc	Backpatch introduction of TupleDescAttr(tupdesc, i). `2cd7084524` / `c6293249d` change the way individual attributes in a TupleDesc are stored / accessed. To reduce the effort of making extensions compatible with postgresql 11, and to ease future backpatching, backpatch introduction of TupleDescAttr() to all releases. Do not backpatch change in storage, as that'd be a breaking change for existing and working extensions. Author: Andres Freund Discussion: https://postgr.es/m/20170820181723.tdswdinzptbcwhrr@alap3.anarazel.de Backpatch: 9.2-	2017-08-22 07:47:48 -07:00
Tom Lane	5a3688dc88	Remove AtEOXact_CatCache(). The sole useful effect of this function, to check that no catcache entries have positive refcounts at transaction end, has really been obsolete since we introduced ResourceOwners in PG 8.1. We reduced the checks to assertions years ago, so that the function was a complete no-op in production builds. There have been previous discussions about removing it entirely, but consensus up to now was that it had some small value as a cross-check for bugs in the ResourceOwner logic. However, it now emerges that it's possible to trigger these assertions if you hit an assert-enabled backend with SIGTERM during a call to SearchCatCacheList, because that function temporarily increases the refcounts of entries it's intending to add to a catcache list construct. In a normal ERROR scenario, the extra refcounts are cleaned up by SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a transaction abort and exit without ever executing PG_CATCH handlers. There's a case to be made that this is a generic hazard and we should consider restructuring elog(FATAL) handling so that pending PG_CATCH handlers do get run. That's pretty scary though: it could easily create more problems than it solves. Preliminary stress testing by Andreas Seltenreich suggests that there are not many live problems of this ilk, so we rejected that idea. There are more-localized ways to fix the problem; the most principled one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY. But adding cycles to SearchCatCacheList isn't very appealing. We could also weaken the assertions in AtEOXact_CatCache in some more or less ad-hoc way, but that just makes its raison d'etre even less compelling. In the end, the most reasonable solution seems to be to just remove AtEOXact_CatCache altogether, on the grounds that it's not worth trying to fix it. It hasn't found any bugs for us in many years. Per report from Jeevan Chalke. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com	2017-08-13 16:15:14 -04:00
Tom Lane	6d81d0a215	Stamp 9.4.13.	2017-08-07 17:15:44 -04:00
Tatsuo Ishii	a282d1871c	Fix comment. XLByteToSeg and XLByteToPrevSeg calculate only a segment number. The definition of these macros were modified by commit `dfda6ebaec` but the comment remain unchanged. Patch by Yugo Nagata. Back patched to 9.3 and beyond.	2017-08-01 08:08:54 +09:00
Andres Freund	a27c566c77	Fix leaking of small spilled subtransactions during logical decoding. When, during logical decoding, a transaction gets too big, it's contents get spilled to disk. Not just the top-transaction gets spilled, but also all of its subtransactions, even if they're not that large themselves. Unfortunately we didn't clean up such small spilled subtransactions from disk. Fix that, by keeping better track of whether a transaction has been spilled to disk. Author: Andres Freund Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello Discussion: https://postgr.es/m/1457621358.355011041@f382.i.mail.ru https://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com Backpatch: 9.4-, where logical decoding was introduced	2017-06-18 19:14:09 -07:00
Andres Freund	55d7027d58	Unify SIGHUP handling between normal and walsender backends. Because walsender and normal backends share the same main loop it's problematic to have two different flag variables, set in signal handlers, indicating a pending configuration reload. Only certain walsender commands reach code paths checking for the variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT ... LOGICAL, notably not base backups). This is a bug present since the introduction of walsender, but has gotten worse in releases since then which allow walsender to do more. A later patch, not slated for v10, will similarly unify SIGHUP handling in other types of processes as well. Author: Petr Jelinek, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de Backpatch: 9.2-, bug is present since 9.0	2017-06-05 19:18:16 -07:00
Andres Freund	1cdc0ab9c1	Prevent possibility of panics during shutdown checkpoint. When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and logical decoding related commands (e.g. via hint bits). So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, checkpointer, itself triggered by postmaster, sends a PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend is idle or runs an SQL query this causes the backend to shutdown, if logical replication is in progress all existing WAL records are processed followed by a shutdown. Otherwise this causes the walsender to switch to the "stopping" state. In this state, the walsender will reject any further replication commands. The checkpointer begins the shutdown checkpoint once all walsenders are confirmed as stopping. When the shutdown checkpoint finishes, the postmaster sends us SIGUSR2. This instructs walsender to send any outstanding WAL, including the shutdown checkpoint record, wait for it to be replicated to the standby, and then exit. Author: Andres Freund, based on an earlier patch by Michael Paquier Reported-By: Fujii Masao, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de Backpatch: 9.4, where logical decoding was introduced	2017-06-05 19:18:16 -07:00
Andres Freund	79abd23db1	Fix race condition leading to hanging logical slot creation. The snapshot assembly during the creation of logical slots relied waiting for transactions in xl_running_xacts to end, by checking for their commit/abort records. Unfortunately, despite locking, it is possible to see an xl_running_xact record listing transactions as ready, that have already WAL-logged an commit/abort record, as the locking just prevents the ProcArray to be adjusted, and the commit record has to be logged first. That lead to either delayed or hanging snapshot creation, because snapbuild.c would wait "forever" to see commit/abort records for some transactions. That hang resolved only if a xl_running_xacts record without any running transactions happened to be logged, far from certain on a busy server. It's impractical to prevent that via more heavyweight locking, the likelihood of deadlocks and significantly increased contention would be too big. Instead change the initial snapshot creation to be solely based on tracking the oldest running transaction via xl_running_xacts->oldestRunningXid - that actually ends up significantly simplifying the code. That has two disadvantages: 1) Because we cannot fully "trust" the contents of xl_running_xacts, we cannot use it to build the initial snapshot. Instead we have to wait twice for all running transactions to finish. 2) Previously a slot, unless the race occurred, could be created when the all transaction perceived as running based on commit/abort records, now we have to wait for the next xl_running_xacts record. To address that, trigger logging new xl_running_xacts record from within snapbuild.c exactly when necessary. Unfortunately snabuild.c's SnapBuild is stored on disk, one of the stupider ideas of a certain Mr Freund, so we can't change it in a minor release. As this is going to be backpatched, we have to hack around a bit to keep on-disk compatibility. A later commit will rejigger that on master. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced	2017-05-13 14:21:00 -07:00
Tom Lane	32900700f4	Avoid searching for callback functions in CallSyscacheCallbacks(). We have now grown enough registerable syscache-invalidation callback functions that the original assumption that there would be few of them is causing performance problems. In particular, let's fix things so that CallSyscacheCallbacks doesn't have to search the whole array to find which callback(s) to invoke for a given cache ID. Preserve the original behavior that callbacks are called in order of registration, just in case there's someplace that depends on that (which I doubt). In support of this, export the number of syscaches from syscache.h. People could have found that out anyway from the enum, but adding a #define makes that much safer. This provides a useful additional speedup in Mathieu Fenniak's logical-decoding test case, although we're reaching the point of diminishing returns there. I think any further improvement will have to come from reducing the number of cache invalidations that are triggered in the first place. Still, we can hope that this change gives some incremental benefit for all invalidation scenarios. Back-patch to 9.4 where logical decoding was introduced. Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com	2017-05-12 19:05:30 -04:00
Tom Lane	64417f8d35	Avoid searching for the target catcache in CatalogCacheIdInvalidate. A test case provided by Mathieu Fenniak shows that the initial search for the target catcache in CatalogCacheIdInvalidate consumes a very significant amount of overhead in cases where cache invalidation is triggered but has little useful work to do. There is no good reason for that search to exist at all, as the index array maintained by syscache.c allows direct lookup of the catcache from its ID. We just need a frontend function in syscache.c, matching the division of labor for most other cache-accessing operations. While there's more that can be done in this area, this patch alone reduces the runtime of Mathieu's example by 2X. We can hope that it offers some useful benefit in other cases too, although usually cache invalidation overhead is not such a striking fraction of the total runtime. Back-patch to 9.4 where logical decoding was introduced. It might be worth going further back, but presently the only case we know of where cache invalidation is really a significant burden is in logical decoding. Also, older branches have fewer catcaches, reducing the possible benefit. (Note: although this nominally changes catcache's API, we have always documented CatalogCacheIdInvalidate as a private function, so I would have little sympathy for an external module calling it directly. So backpatching should be fine.) Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com	2017-05-12 18:17:29 -04:00
Tom Lane	34af9129e6	Stamp 9.4.12.	2017-05-08 17:19:04 -04:00
Tom Lane	f793effdc7	Further patch rangetypes_selfuncs.c's statistics slot management. Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8, not of the type of the column the statistics are for. This bug is at least partly the fault of sloppy specification comments for get_attstatsslot()/free_attstatsslot(): the type OID they want is that of the stavalues entries, not of the underlying column. (I double-checked other callers and they seem to get this right.) Adjust the comments to be more correct. Per buildfarm. Security: CVE-2017-7484	2017-05-08 15:02:58 -04:00
Peter Eisentraut	3e5ea1f9b2	Add security checks to selectivity estimation functions Some selectivity estimation functions run user-supplied operators over data obtained from pg_statistic without security checks, which allows those operators to leak pg_statistic data without having privileges on the underlying tables. Fix by checking that one of the following is satisfied: (1) the user has table or column privileges on the table underlying the pg_statistic data, or (2) the function implementing the user-supplied operator is leak-proof. If neither is satisfied, planning will proceed as if there are no statistics available. At least one of these is satisfied in most cases in practice. The only situations that are negatively impacted are user-defined or not-leak-proof operators on a security-barrier view. Reported-by: Robert Haas <robertmhaas@gmail.com> Author: Peter Eisentraut <peter_e@gmx.net> Author: Tom Lane <tgl@sss.pgh.pa.us> Security: CVE-2017-7484	2017-05-08 09:19:15 -04:00
Heikki Linnakangas	96d0f988b1	Give nicer error message when connecting to a v10 server requiring SCRAM. This is just to give the user a hint that they need to upgrade, if they try to connect to a v10 server that uses SCRAM authentication, with an older client. Commit to all stable branches, but not master. Discussion: https://www.postgresql.org/message-id/bbf45d92-3896-eeb7-7399-2111d517261b@pivotal.io	2017-05-05 11:24:29 +03:00
Andres Freund	b6ecf26ccc	Don't use on-disk snapshots for exported logical decoding snapshot. Logical decoding stores historical snapshots on disk, so that logical decoding can restart without having to reconstruct a snapshot from scratch (for which the resources are not guaranteed to be present anymore). These serialized snapshots were also used when creating a new slot via the walsender interface, which can export a "full" snapshot (i.e. one that can read all tables, not just catalog ones). The problem is that the serialized snapshots are only useful for catalogs and not for normal user tables. Thus the use of such a serialized snapshot could result in an inconsistent snapshot being exported, which could lead to queries returning wrong data. This would only happen if logical slots are created while another logical slot already exists. Author: Petr Jelinek Reviewed-By: Andres Freund Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.	2017-04-27 15:29:52 -07:00
Andres Freund	5da6461387	Preserve required !catalog tuples while computing initial decoding snapshot. The logical decoding machinery already preserved all the required catalog tuples, which is sufficient in the course of normal logical decoding, but did not guarantee that non-catalog tuples were preserved during computation of the initial snapshot when creating a slot over the replication protocol. This could cause a corrupted initial snapshot being exported. The time window for issues is usually not terribly large, but on a busy server it's perfectly possible to it hit it. Ongoing decoding is not affected by this bug. To avoid increased overhead for the SQL API, only retain additional tuples when a logical slot is being created over the replication protocol. To do so this commit changes the signature of CreateInitDecodingContext(), but it seems unlikely that it's being used in an extension, so that's probably ok. In a drive-by fix, fix handling of ReplicationSlotsComputeRequiredXmin's already_locked argument, which should only apply to ProcArrayLock, not ReplicationSlotControlLock. Reported-By: Erik Rijkers Analyzed-By: Petr Jelinek Author: Petr Jelinek, heavily editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.	2017-04-27 13:13:37 -07:00
Tom Lane	bb132cddf8	Support OpenSSL 1.1.0 in 9.4 branch. This commit back-patches the equivalent of the 9.5-branch commits `e2838c580` and `48e5ba61e`, so that we can work with OpenSSL 1.1.0 in 9.4. (Going further back would be a good thing but will take more work; meanwhile let's see what the buildfarm makes of this.) Original patches by Andreas Karlsson and Heikki Linnakangas, back-patching work by Andreas Karlsson. Patch: https://postgr.es/m/0c817abb-3f7d-20fb-583a-58f7593a0bea@proxel.se Discussion: https://postgr.es/m/5129.1492293840@sss.pgh.pa.us	2017-04-15 20:16:03 -04:00
Tom Lane	89a41a1b6b	Improve castNode notation by introducing list-extraction-specific variants. This extends the castNode() notation introduced by commit `5bcab1114` to provide, in one step, extraction of a list cell's pointer and coercion to a concrete node type. For example, "lfirst_node(Foo, lc)" is the same as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode that have appeared so far include a list extraction call, so this is pretty widely useful, and it saves a few more keystrokes compared to the old way. As with the previous patch, back-patch the addition of these macros to pg_list.h, so that the notation will be available when back-patching. Patch by me, after an idea of Andrew Gierth's. Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us	2017-04-10 13:51:29 -04:00
Heikki Linnakangas	88101abe70	Remove dead code and fix comments in fast-path function handling. HandleFunctionRequest() is no longer responsible for reading the protocol message from the client, since commit `2b3a8b20c2`. Fix the outdated comments. HandleFunctionRequest() now always returns 0, because the code that used to return EOF was moved in `2b3a8b20c2`. Therefore, the caller no longer needs to check the return value. Reported by Andres Freund. Backpatch to all supported versions, even though this doesn't have any user-visible effect, to make backporting future patches in this area easier. Discussion: https://www.postgresql.org/message-id/20170405010525.rt5azbya5fkbhvrx@alap3.anarazel.de	2017-04-06 09:11:22 +03:00
Tom Lane	8851bcf881	Fix integer-overflow problems in interval comparison. When using integer timestamps, the interval-comparison functions tried to compute the overall magnitude of an interval as an int64 number of microseconds. As reported by Frazer McLean, this overflows for intervals exceeding about 296000 years, which is bad since we nominally allow intervals many times larger than that. That results in wrong comparison results, and possibly in corrupted btree indexes for columns containing such large interval values. To fix, compute the magnitude as int128 instead. Although some compilers have native support for int128 calculations, many don't, so create our own support functions that can do 128-bit addition and multiplication if the compiler support isn't there. These support functions are designed with an eye to allowing the int128 code paths in numeric.c to be rewritten for use on all platforms, although this patch doesn't do that, or even provide all the int128 primitives that will be needed for it. Back-patch as far as 9.4. Earlier releases did not guard against overflow of interval values at all (commit `146604ec4` fixed that), so it seems not very exciting to worry about overly-large intervals for them. Before 9.6, we did not assume that unreferenced "static inline" functions would not draw compiler warnings, so omit functions not directly referenced by timestamp.c, the only present consumer of int128.h. (We could have omitted these functions in HEAD too, but since they were written and debugged on the way to the present patch, and they look likely to be needed by numeric.c, let's keep them in HEAD.) I did not bother to try to prevent such warnings in a --disable-integer-datetimes build, though. Before 9.5, configure will never define HAVE_INT128, so the part of int128.h that exploits a native int128 implementation is dead code in the 9.4 branch. I didn't bother to remove it, thinking that keeping the file looking similar in different branches is more useful. In HEAD only, add a simple test harness for int128.h in src/tools/. In back branches, this does not change the float-timestamps code path. That's not subject to the same kind of overflow risk, since it computes the interval magnitude as float8. (No doubt, when this code was originally written, overflow was disregarded for exactly that reason.) There is a precision hazard instead :-(, but we'll avert our eyes from that question, since no complaints have been reported and that code's deprecated anyway. Kyotaro Horiguchi and Tom Lane Discussion: https://postgr.es/m/1490104629.422698.918452336.26FA96B7@webmail.messagingengine.com	2017-04-05 23:51:28 -04:00
Robert Haas	bbd5e600ff	Fix failure to mark init buffers as BM_PERMANENT. This could result in corruption of the init fork of an unlogged index if the ambuildempty routine for that index used shared buffers to create the init fork, which was true for gin, gist, and hash indexes. Patch by me, based on an earlier patch by Michael Paquier, who also reviewed this one. This also incorporates an idea from Artur Zakirov. Discussion: http://postgr.es/m/CACYUyc8yccE4xfxhqxfh_Mh38j7dRFuxfaK1p6dSNAEUakxUyQ@mail.gmail.com	2017-03-14 12:06:36 -04:00
Tom Lane	c3b5cfe33c	Formatting and docs corrections for logical decoding output plugins. Make the typedefs for output plugins consistent with project style; they were previously not even consistent with each other as to layout or inclusion of parameter names. Make the documentation look the same, and fix errors therein (missing and misdescribed parameters). Back-patch because of the documentation bugs.	2017-02-15 18:15:47 -05:00
Tom Lane	01306452b1	Stamp 9.4.11.	2017-02-06 16:49:02 -05:00
Heikki Linnakangas	1dd06ede17	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:34:24 +02:00
Tom Lane	d02f038c35	Orthography fixes for new castNode() macro. Clean up hastily-composed comment. Normalize whitespace. Erik Rijkers and myself	2017-01-27 08:33:58 -05:00
Andres Freund	cf8c86af95	Add castNode(type, ptr) for safe casting between NodeTag based types. The new function allows to cast from one NodeTag based type to another, while asserting that the conversion is valid. This replaces the common pattern of doing a cast and a Assert(IsA(ptr, type)) close-by. As this seems likely to be used pervasively, we decided to backpatch this change the addition of this macro. Otherwise backpatched fixes are more likely not to work on back-branches. On branches before 9.6, where we do not yet rely on inline functions being available, the type assertion is only performed if PG_USE_INLINE support is detected. The cast obviously is performed regardless. For the benefit of verifying the macro compiles in the back-branches, this commit contains a single use of the new macro. On master, a somewhat larger conversion will be committed separately. Author: Peter Eisentraut and Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/c5d387d9-3440-f5e0-f9d4-71d53b9fbe52@2ndquadrant.com Backpatch: 9.2-	2017-01-26 16:47:03 -08:00
Alvaro Herrera	75c155f65b	Change some test macros to return true booleans These macros work fine when they are used directly in an "if" test or similar, but as soon as the return values are assigned to boolean variables (or passed as boolean arguments to some function), they become bugs, hopefully caught by compiler warnings. To avoid future problems, fix the definitions so that they return actual booleans. To further minimize the risk that somebody uses them in back-patched fixes that only work correctly in branches starting from the current master and not in old ones, back-patch the change to supported branches as appropriate. See also commit `af4472bcb8`, and the long discussion (and larger patch) in the thread mentioned in its commit message. Discussion: https://postgr.es/m/18672.1483022414@sss.pgh.pa.us	2017-01-18 18:06:13 -03:00
Alvaro Herrera	30e3cb307d	Avoid pin scan for replay of XLOG_BTREE_VACUUM in all cases Replay of XLOG_BTREE_VACUUM during Hot Standby was previously thought to require complex interlocking that matched the requirements on the master. This required an O(N) operation that became a significant problem with large indexes, causing replication delays of seconds or in some cases minutes while the XLOG_BTREE_VACUUM was replayed. This commit skips the “pin scan” that was previously required, by observing in detail when and how it is safe to do so, with full documentation. The pin scan is skipped only in replay; the VACUUM code path on master is not touched here. No tests included. Manual tests using an additional patch to view WAL records and their timing have shown the change in WAL records and their handling has successfully reduced replication delay. This is a back-patch of commits `687f2cd7a0`, `3e4b7d8798`, `b602842613` by Simon Riggs, to branches 9.4 and 9.5. No further backpatch is possible because this depends on catalog scans being MVCC. I (Álvaro) additionally updated a slight problem in the README, which explains why this touches the 9.6 and master branches.	2016-11-17 13:31:30 -03:00
Tom Lane	3e844a34b8	Account for catalog snapshot in PGXACT->xmin updates. The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting for whether MyPgXact->xmin could be cleared or advanced. In normal transactions this was masked by the fact that the transaction snapshot would be older, but during backend startup and certain utility commands it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had been cleared, meaning that recently-deleted rows could be pruned even though this snapshot could still see them, causing unexpected catalog lookup failures. This effect appears to be the explanation for a recent failure on buildfarm member piculet. To fix, add the CatalogSnapshot to the RegisteredSnapshots heap whenever it is valid. In the previous logic, it was possible for the CatalogSnapshot to remain valid across waits for client input, but with this change that would mean it delays advance of global xmin in cases where it did not before. To avoid possibly causing new table-bloat problems with clients that sit idle for long intervals, add code to invalidate the CatalogSnapshot before waiting for client input. (When the backend is busy, it's unlikely that the CatalogSnapshot would be the oldest snap for very long, so we don't worry about forcing early invalidation of it otherwise.) In passing, remove the CatalogSnapshotStale flag in favor of using "CatalogSnapshot != NULL" to represent validity, as we do for the other special snapshots in snapmgr.c. And improve some obsolete comments. No regression test because I don't know a deterministic way to cause this failure. But the stress test shown in the original discussion provokes "cache lookup failed for relation 1255" within a few dozen seconds for me. Back-patch to 9.4 where MVCC catalog scans were introduced. (Note: it's quite easy to produce similar failures with the same test case in branches before 9.4. But MVCC catalog scans were supposed to fix that.) Discussion: <16447.1478818294@sss.pgh.pa.us>	2016-11-15 15:55:36 -05:00
Tom Lane	c7681b2b9a	Stamp 9.4.10.	2016-10-24 16:12:53 -04:00
Tom Lane	80296813cc	Revert addition of PGDLLEXPORT in PG_FUNCTION_INFO_V1 macro. This turns out not to be as harmless as I thought: MSVC will complain if it sees an "extern" declaration without PGDLLEXPORT and then one with. (Seems fairly silly, given that this can be changed after the fact by the linker, but there you have it.) Therefore, contrib modules that have extern's for V1 functions in header files are falling over in the buildfarm, since none of those externs are marked PGDLLEXPORT. We might or might not conclude that we're willing to plaster those declarations with PGDLLEXPORT in HEAD, but in any case there's no way we're going to ship this change in the back branches. Third-party authors would not thank us for breaking their code in a minor release. Hence, revert the addition of PGDLLEXPORT (but let's keep the extra info in the comment). If we do the other changes we can revert this commit in HEAD. Per buildfarm.	2016-10-12 18:02:12 -04:00
Tom Lane	ee6810fb46	Provide DLLEXPORT markers for C functions via PG_FUNCTION_INFO_V1 macro. This isn't really necessary for our own code, because we use a .DEF file in MSVC builds (see gendef.pl), or --export-all-symbols in MinGW and Cygwin builds, to ensure that all global symbols in loadable modules will be exported on Windows. However, third-party authors might use different build processes that need this marker, and it's harmless enough for our own builds. To some extent, this is an oversight in commit `e7128e8db`, so back-patch to 9.4 where that was added. Laurenz Albe Discussion: <A737B7A37273E048B164557ADEF4A58B539300BD@ntex2010a.host.magwien.gv.at>	2016-10-12 12:46:16 -04:00
Heikki Linnakangas	69ade952e9	Fix copy-pasto in comment. Amit Langote	2016-10-12 12:08:59 +03:00
Tom Lane	32cdf680f1	Avoid using PostmasterRandom() for DSM control segment ID. Commits `470d886c3` et al intended to fix the problem that the postmaster selected the same "random" DSM control segment ID on every start. But using PostmasterRandom() for that destroys the intended property that the delay between random_start_time and random_stop_time will be unpredictable. (Said delay is probably already more predictable than we could wish, but that doesn't mean that reducing it by a couple orders of magnitude is OK.) Revert the previous patch and add a comment warning against misuse of PostmasterRandom. Fix the original problem by calling srandom() early in PostmasterMain, using a low-security seed that will later be overwritten by PostmasterRandom. Discussion: <20789.1474390434@sss.pgh.pa.us>	2016-09-23 09:54:11 -04:00
Robert Haas	626312d1b3	Use PostmasterRandom(), not random(), for DSM control segment ID. Otherwise, every startup gets the same "random" value, which is definitely not what was intended.	2016-09-20 12:30:42 -04:00
Tom Lane	ebd971f74d	Improve unreachability recognition in elog() macro. Some experimentation with an older version of gcc showed that it is able to determine whether "if (elevel_ >= ERROR)" is compile-time constant if elevel_ is declared "const", but otherwise not so much. We had accounted for that in ereport() but were too miserly with braces to make it so in elog(). I don't know how many currently-interesting compilers have the same quirk, but in case it will save some code space, let's make sure that elog() is on the same footing as ereport() for this purpose. Back-patch to 9.3 where we introduced pg_unreachable() calls into elog/ereport.	2016-09-10 17:54:23 -04:00
Tom Lane	08a823e53b	Fix improper repetition of previous results from a hashed aggregate. ExecReScanAgg's check for whether it could re-use a previously calculated hashtable neglected the possibility that the Agg node might reference PARAM_EXEC Params that are not referenced by its input plan node. That's okay if the Params are in upper tlist or qual expressions; but if one appears in aggregate input expressions, then the hashtable contents need to be recomputed when the Param's value changes. To avoid unnecessary performance degradation in the case of a Param that isn't within an aggregate input, add logic to the planner to determine which Params are within aggregate inputs. This requires a new field in struct Agg, but fortunately we never write plans to disk, so this isn't an initdb-forcing change. Per report from Jeevan Chalke. This has been broken since forever, so back-patch to all supported branches. Andrew Gierth, with minor adjustments by me Report: <CAM2+6=VY8ykfLT5Q8vb9B6EbeBk-NGuLbT6seaQ+Fq4zXvrDcA@mail.gmail.com>	2016-08-24 14:37:51 -04:00
Tom Lane	4ac3d77947	Remove bogus dependencies on NUMERIC_MAX_PRECISION. NUMERIC_MAX_PRECISION is a purely arbitrary constraint on the precision and scale you can write in a numeric typmod. It might once have had something to do with the allowed range of a typmod-less numeric value, but at least since 9.1 we've allowed, and documented that we allowed, any value that would physically fit in the numeric storage format; which is something over 100000 decimal digits, not 1000. Hence, get rid of numeric_in()'s use of NUMERIC_MAX_PRECISION as a limit on the allowed range of the exponent in scientific-format input. That was especially silly in view of the fact that you can enter larger numbers as long as you don't use 'e' to do it. Just constrain the value enough to avoid localized overflow, and let make_result be the final arbiter of what is too large. Likewise adjust ecpg's equivalent of this code. Also get rid of numeric_recv()'s use of NUMERIC_MAX_PRECISION to limit the number of base-NBASE digits it would accept. That created a dump/restore hazard for binary COPY without doing anything useful; the wire-format limit on number of digits (65535) is about as tight as we would want. In HEAD, also get rid of pg_size_bytes()'s unnecessary intimacy with what the numeric range limit is. That code doesn't exist in the back branches. Per gripe from Aravind Kumar. Back-patch to all supported branches, since they all contain the documentation claim about allowed range of NUMERIC (cf commit `cabf5d84b`). Discussion: <2895.1471195721@sss.pgh.pa.us>	2016-08-14 15:06:01 -04:00
Tom Lane	860f8fb1aa	Stamp 9.4.9.	2016-08-08 16:29:39 -04:00
Tom Lane	8f180a6cca	Fix misestimation of n_distinct for a nearly-unique column with many nulls. If ANALYZE found no repeated non-null entries in its sample, it set the column's stadistinct value to -1.0, intending to indicate that the entries are all distinct. But what this value actually means is that the number of distinct values is 100% of the table's rowcount, and thus it was overestimating the number of distinct values by however many nulls there are. This could lead to very poor selectivity estimates, as for example in a recent report from Andreas Joseph Krogh. We should discount the stadistinct value by whatever we've estimated the nulls fraction to be. (That is what will happen if we choose to use a negative stadistinct for a column that does have repeated entries, so this code path was just inconsistent.) In addition to fixing the stadistinct entries stored by several different ANALYZE code paths, adjust the logic where get_variable_numdistinct() forces an "all distinct" estimate on the basis of finding a relevant unique index. Unique indexes don't reject nulls, so there's no reason to assume that the null fraction doesn't apply. Back-patch to all supported branches. Back-patching is a bit of a judgment call, but this problem seems to affect only a few users (else we'd have identified it long ago), and it's bad enough when it does happen that destabilizing plan choices in a worse direction seems unlikely. Patch by me, with documentation wording suggested by Dean Rasheed Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena> Discussion: <16143.1470350371@sss.pgh.pa.us>	2016-08-07 18:52:02 -04:00
Tom Lane	45e5496042	Don't CHECK_FOR_INTERRUPTS between WaitLatch and ResetLatch. This coding pattern creates a race condition, because if an interesting interrupt happens after we've checked InterruptPending but before we reset our latch, the latch-setting done by the signal handler would get lost, and then we might block at WaitLatch in the next iteration without ever noticing the interrupt condition. You can put the CHECK_FOR_INTERRUPTS before WaitLatch or after ResetLatch, but not between them. Aside from fixing the bugs, add some explanatory comments to latch.h to perhaps forestall the next person from making the same mistake. In HEAD, also replace gather_readnext's direct call of HandleParallelMessages with CHECK_FOR_INTERRUPTS. It does not seem clean or useful for this one caller to bypass ProcessInterrupts and go straight to HandleParallelMessages; not least because that fails to consider the InterruptPending flag, resulting in useless work both here (if InterruptPending isn't set) and in the next CHECK_FOR_INTERRUPTS call (if it is). This thinko seems to have been introduced in the initial coding of storage/ipc/shm_mq.c (commit `ec9037df2`), and then blindly copied into all the subsequent parallel-query support logic. Back-patch relevant hunks to 9.4 to extirpate the error everywhere. Discussion: <1661.1469996911@sss.pgh.pa.us>	2016-08-01 15:13:53 -04:00
Tom Lane	66f7e40812	Fix assorted fallout from IS [NOT] NULL patch. Commits `4452000f3` et al established semantics for NullTest.argisrow that are a bit different from its initial conception: rather than being merely a cache of whether we've determined the input to have composite type, the flag now has the further meaning that we should apply field-by-field testing as per the standard's definition of IS [NOT] NULL. If argisrow is false and yet the input has composite type, the construct instead has the semantics of IS [NOT] DISTINCT FROM NULL. Update the comments in primnodes.h to clarify this, and fix ruleutils.c and deparse.c to print such cases correctly. In the case of ruleutils.c, this merely results in cosmetic changes in EXPLAIN output, since the case can't currently arise in stored rules. However, it represents a live bug for deparse.c, which would formerly have sent a remote query that had semantics different from the local behavior. (From the user's standpoint, this means that testing a remote nested-composite column for null-ness could have had unexpected recursive behavior much like that fixed in 4452000f3.) In a related but somewhat independent fix, make plancat.c set argisrow to false in all NullTest expressions constructed to represent "attnotnull" constructs. Since attnotnull is actually enforced as a simple null-value check, this is a more accurate representation of the semantics; we were previously overpromising what it meant for composite columns, which might possibly lead to incorrect planner optimizations. (It seems that what the SQL spec expects a NOT NULL constraint to mean is an IS NOT NULL test, so arguably we are violating the spec and should fix attnotnull to do the other thing. If we ever do, this part should get reverted.) Back-patch, same as the previous commit. Discussion: <10682.1469566308@sss.pgh.pa.us>	2016-07-28 16:09:15 -04:00

1 2 3 4 5 ...

6588 commits