postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-02-09 22:04:24 -05:00

Author	SHA1	Message	Date
Stephen Frost	cb9dcbc1ee	Bump catversion for restrictive RLS changes Mea culpa. Pointed out by Andres.	2016-12-06 10:12:31 -05:00
Tom Lane	3ebf2b4545	Remove extraneous semicolon from uses of relptr_declare(). If we're going to write a semicolon after calls of relptr_declare(), then we don't need one inside the macro, and removing it suppresses "empty declaration" warnings from pickier compilers (eg pademelon). While at it, we might as well use relptr() inside relptr_declare(), because otherwise that macro would likely go unused altogether. Also improve the comment, which I for one found unclear, and provide a specific example of intended usage.	2016-12-05 20:27:55 -05:00
Stephen Frost	093129c9d9	Add support for restrictive RLS policies We have had support for restrictive RLS policies since 9.5, but they were only available through extensions which use the appropriate hooks. This adds support into the grammer, catalog, psql and pg_dump for restrictive RLS policies, thus reducing the cases where an extension is necessary. In passing, also move away from using "AND"d and "OR"d in comments. As pointed out by Alvaro, it's not really appropriate to attempt to make verbs out of "AND" and "OR", so reword those comments which attempted to. Reviewed By: Jeevan Chalke, Dean Rasheed Discussion: https://postgr.es/m/20160901063404.GY4028@tamriel.snowman.net	2016-12-05 15:50:55 -05:00
Robert Haas	88f626f868	Fix more DSA problems uncovered by the buildfarm. On 32-bit systems, don't try to use 64-bit DSA pointers, because the computation of DSA_MAX_SEGMENT_SIZE overflows Size. Cast 1 to Size before shifting it, so that the compiler doesn't produce a result of the wrong width. In passing, change one use of size_t to Size.	2016-12-05 10:38:08 -05:00
Robert Haas	670b3bc8f5	Try to fix some DSA-related compiler warnings. Commit `13df76a537` was overconfident about how portable %016lx is. Some compilers complain because they need %016llx, while platforms where DSA pointers are only 32 bits get unhappy about using a 64-bit format for a 32-bit quantity. Thomas Munro, per an off-list suggestion from me.	2016-12-05 10:01:08 -05:00
Heikki Linnakangas	fe0a0b5993	Replace PostmasterRandom() with a stronger source, second attempt. This adds a new routine, pg_strong_random() for generating random bytes, for use in both frontend and backend. At the moment, it's only used in the backend, but the upcoming SCRAM authentication patches need strong random numbers in libpq as well. pg_strong_random() is based on, and replaces, the existing implementation in pgcrypto. It can acquire strong random numbers from a number of sources, depending on what's available: - OpenSSL RAND_bytes(), if built with OpenSSL - On Windows, the native cryptographic functions are used - /dev/urandom Unlike the current pgcrypto function, the source is chosen by configure. That makes it easier to test different implementations, and ensures that we don't accidentally fall back to a less secure implementation, if the primary source fails. All of those methods are quite reliable, it would be pretty surprising for them to fail, so we'd rather find out by failing hard. If no strong random source is available, we fall back to using erand48(), seeded from current timestamp, like PostmasterRandom() was. That isn't cryptographically secure, but allows us to still work on platforms that don't have any of the above stronger sources. Because it's not very secure, the built-in implementation is only used if explicitly requested with --disable-strong-random. This replaces the more complicated Fortuna algorithm we used to have in pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom, so it doesn't seem worth the maintenance effort to keep that. pgcrypto functions that require strong random numbers will be disabled with --disable-strong-random. Original patch by Magnus Hagander, tons of further work by Michael Paquier and me. Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com	2016-12-05 13:42:59 +02:00
Robert Haas	767a9039d7	Fix thinko in `b3427dade1`.	2016-12-02 15:06:41 -05:00
Tom Lane	b3427dade1	Delete deleteWhatDependsOn() in favor of more performDeletion() flag bits. deleteWhatDependsOn() had grown an uncomfortably large number of assumptions about what it's used for. There are actually only two minor differences between what it does and what a regular performDeletion() call can do, so let's invent additional bits in performDeletion's existing flags argument that specify those behaviors, and get rid of deleteWhatDependsOn() as such. (We'd probably have done it this way from the start, except that performDeletion didn't originally have a flags argument, IIRC.) Also, add a SKIP_EXTENSIONS flag bit that prevents ever recursing to an extension, and use that when dropping temporary objects at session end. This provides a more general solution to the problem addressed in a hacky way in commit `08dd23cec`: if an extension script creates temp objects and forgets to remove them again, the whole extension went away when its contained temp objects were deleted. The previous solution only covered temp relations, but this solves it for all object types. These changes require minor additions in dependency.c to pass the flags to subroutines that previously didn't get them, but it's still a net savings of code, and it seems cleaner than before. Having done this, revert the special-case code added in `08dd23cec` that prevented addition of pg_depend records for temp table extension membership, because that caused its own oddities: dropping an extension that had created such a table didn't automatically remove the table, leading to a failure if the table had another dependency on the extension (such as use of an extension data type), or to a duplicate-name failure if you then tried to recreate the extension. But we keep the part that prevents the pg_temp_nnn schema from becoming an extension member; we never want that to happen. Add a regression test case covering these behaviors. Although this fixes some arguable bugs, we've heard few field complaints, and any such problems are easily worked around by explicitly dropping temp objects at the end of extension scripts (which seems like good practice anyway). So I won't risk a back-patch. Discussion: https://postgr.es/m/e51f4311-f483-4dd0-1ccc-abec3c405110@BlueTreble.com	2016-12-02 14:57:55 -05:00
Robert Haas	13df76a537	Introduce dynamic shared memory areas. Programmers discovered decades ago that it was useful to have a simple interface for allocating and freeing memory, which is why malloc() and free() were invented. Unfortunately, those handy tools don't work with dynamic shared memory segments because those are specific to PostgreSQL and are not necessarily mapped at the same address in every cooperating process. So invent our own allocator instead. This makes it possible for processes cooperating as part of parallel query execution to allocate and free chunks of memory without having to reserve them prior to the start of execution. It could also be used for longer lived objects; for example, we could consider storing data for pg_stat_statements or the stats collector in shared memory using these interfaces, rather than writing them to files. Basically, anything that needs shared memory but can't predict in advance how much it's going to need might find this useful. Thomas Munro and Robert Haas. The original code (of mine) on which Thomas based his work was actually designed to be a new backend-local memory allocator for PostgreSQL, but that hasn't gone anywhere - or not yet, anyway. Thomas took that work and performed major refactoring and extensive modifications to make it work with dynamic shared memory, including the addition of appropriate locking. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com	2016-12-02 12:34:36 -05:00
Robert Haas	13e14a78ea	Management of free memory pages. This is intended as infrastructure for a full-fledged allocator for dynamic shared memory. The interface looks a bit like a real allocator, but only supports allocating and freeing memory in multiples of the 4kB page size. Further, to free memory, you must know the size of the span you wish to free, in pages. While these are make it unsuitable as an allocator in and of itself, it still serves as very useful scaffolding for a full-fledged allocator. Robert Haas and Thomas Munro. This code is mostly the same as my 2014 submission, but Thomas fixed quite a few bugs and made some changes to the interface. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com	2016-12-02 12:03:30 -05:00
Robert Haas	fbc1c12a94	Add a crude facility for dealing with relative pointers. C doesn't have any sort of built-in understanding of a pointer relative to some arbitrary base address, but dynamic shared memory segments can be mapped at different addresses in different processes, so any sort of shared data structure stored within a dynamic shared memory segment can't use absolute pointers. We could use something like Size to represent a relative pointer, but then the compiler provides no type-checking. Use stupid macro tricks to get some type-checking. Patch originally by me. Concept suggested by Andres Freund. Recently resubmitted as part of Thomas Munro's work on dynamic shared memory allocation. Discussion: 20131205144434.GG12398@alap2.anarazel.de Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com	2016-12-02 11:29:01 -05:00
Robert Haas	b460f5d669	Add max_parallel_workers GUC. Increase the default value of the existing max_worker_processes GUC from 8 to 16, and add a new max_parallel_workers GUC with a maximum of 8. This way, even if the maximum amount of parallel query is happening, there is still room for background workers that do other things, as originally envisioned when max_worker_processes was added. Julien Rouhaud, reviewed by Amit Kapila and by revised by me.	2016-12-02 07:42:58 -05:00
Alvaro Herrera	5714931b07	Fix Windows build for `78c8c81439` Author: Petr Jelínek	2016-12-02 09:40:36 -03:00
Alvaro Herrera	fa2fa99552	Permit dump/reload of not-too-large >1GB tuples Our documentation states that our maximum field size is 1 GB, and that our maximum row size of 1.6 TB. However, while this might be attainable in theory with enough contortions, it is not workable in practice; for starters, pg_dump fails to dump tables containing rows larger than 1 GB, even if individual columns are well below the limit; and even if one does manage to manufacture a dump file containing a row that large, the server refuses to load it anyway. This commit enables dumping and reloading of such tuples, provided two conditions are met: 1. no single column is larger than 1 GB (in output size -- for bytea this includes the formatting overhead) 2. the whole row is not larger than 2 GB There are three related changes to enable this: a. StringInfo's API now has two additional functions that allow creating a string that grows beyond the typical 1GB limit (and "long" string). ABI compatibility is maintained. We still limit these strings to 2 GB, though, for reasons explained below. b. COPY now uses long StringInfos, so that pg_dump doesn't choke trying to emit rows longer than 1GB. c. heap_form_tuple now uses the MCXT_ALLOW_HUGE flag in its allocation for the input tuple, which means that large tuples are accepted on input. Note that at this point we do not apply any further limit to the input tuple size. The main reason to limit to 2 GB is that the FE/BE protocol uses 32 bit length words to describe each row; and because the documentation is ambiguous on its signedness and libpq does consider it signed, we cannot use the highest-order bit. Additionally, the StringInfo API uses "int" (which is 4 bytes wide in most platforms) in many places, so we'd need to change that API too in order to improve, which has lots of fallout. Backpatch to 9.5, which is the oldest that has MemoryContextAllocExtended, a necessary piece of infrastructure. We could apply to 9.4 with very minimal additional effort, but any further than that would require backpatching "huge" allocations too. This is the largest set of changes we could find that can be back-patched without breaking compatibility with existing systems. Fixing a bigger set of problems (for example, dumping tuples bigger than 2GB, or dumping fields bigger than 1GB) would require changing the FE/BE protocol and/or changing the StringInfo API in an ABI-incompatible way, neither of which would be back-patchable. Authors: Daniel Vérité, Álvaro Herrera Reviewed by: Tomas Vondra Discussion: https://postgr.es/m/20160229183023.GA286012@alvherre.pgsql	2016-12-02 00:34:01 -03:00
Peter Eisentraut	78c8c81439	Refactor libpqwalreceiver The whole walreceiver API is now wrapped into a struct, like most of our other loadable module APIs. The libpq connection is no longer a global variable in libpqwalreceiver. Instead, it is encapsulated into a struct that is passed around the functions. This allows multiple walreceivers to run at the same time. Add some rudimentary support for logical replication connections to libpqwalreceiver. These changes are mostly cosmetic and are going to be useful for the future logical replication patches. From: Petr Jelinek <petr@2ndquadrant.com>	2016-12-01 20:23:28 -05:00
Peter Eisentraut	597a87ccc9	Use latch instead of select() in walreceiver Replace use of poll()/select() by WaitLatchOrSocket(), which is more portable and flexible. Also change walreceiver to use its procLatch instead of a custom latch. From: Petr Jelinek <petr@2ndquadrant.com>	2016-12-01 20:23:28 -05:00
Andres Freund	fc4b3dea29	User narrower representative tuples in the hash-agg hashtable. So far the hashtable stored representative tuples in the form of its input slot, with all columns in the hashtable that are not needed (i.e. not grouped upon or functionally dependent) set to NULL. Thats good for saving memory, but it turns out that having tuples full of NULL isn't free. slot_deform_tuple is faster if there's no NULL bitmap even if no NULLs are encountered, and skipping over leading NULLs isn't free. So compute a separate tuple descriptor that only contains the needed columns. As columns have already been moved in/out the slot for the hashtable that does not imply additional per-row overhead. Author: Andres Freund Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de	2016-11-30 17:30:09 -08:00
Andres Freund	8ed3f11bb0	Perform one only projection to compute agg arguments. Previously we did a ExecProject() for each individual aggregate argument. That turned out to be a performance bottleneck in queries with multiple aggregates. Doing all the argument computations in one ExecProject() is quite a bit cheaper because ExecProject's fastpath can do the work at once in a relatively tight loop, and because it can get all the required columns with a single slot_getsomeattr and save some other redundant setup costs. Author: Andres Freund Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de	2016-11-30 16:20:24 -08:00
Robert Haas	6d46f4783e	Improve hash index bucket split behavior. Previously, the right to split a bucket was represented by a heavyweight lock on the page number of the primary bucket page. Unfortunately, this meant that every scan needed to take a heavyweight lock on that bucket also, which was bad for concurrency. Instead, use a cleanup lock on the primary bucket page to indicate the right to begin a split, so that scans only need to retain a pin on that page, which is they would have to acquire anyway, and which is also much cheaper. In addition to reducing the locking cost, this also avoids locking out scans and inserts for the entire lifetime of the split: while the new bucket is being populated with copies of the appropriate tuples from the old bucket, scans and inserts can happen in parallel. There are minor concurrency improvements for vacuum operations as well, though the situation there is still far from ideal. This patch also removes the unworldly assumption that a split will never be interrupted. With the new code, a split is done in a series of small steps and the system can pick up where it left off if it is interrupted prior to completion. While this patch does not itself add write-ahead logging for hash indexes, it is clearly a necessary first step, since one of the things that could interrupt a split is the removal of electrical power from the machine performing it. Amit Kapila. I wrote the original design on which this patch is based, and did a good bit of work on the comments and README through multiple rounds of review, but all of the code is Amit's. Also reviewed by Jesper Pedersen, Jeff Janes, and others. Discussion: http://postgr.es/m/CAA4eK1LfzcZYxLoXS874Ad0+S-ZM60U9bwcyiUZx9mHZ-KCWhw@mail.gmail.com	2016-11-30 15:39:21 -05:00
Tom Lane	11da83a0e7	Add uuid to the set of types supported by contrib/btree_gist. Paul Jungwirth, reviewed and hacked on by Teodor Sigaev, Ildus Kurbangaliev, Adam Brusselback, Chris Bandy, and myself. Discussion: https://postgr.es/m/CA+renyUEE29=X01JXdz8_TQvo6n9=2XoEBBRnQ8rkLyr+kjPxQ@mail.gmail.com Discussion: https://postgr.es/m/55F6EE82.8080209@sigaev.ru	2016-11-29 14:08:34 -05:00
Robert Haas	273270593f	Mark IsPostmasterEnvironment and IsBackgroundWorker as PGDLLIMPORT. Per request from Craig Ringer.	2016-11-26 10:29:18 -05:00
Tom Lane	dbdfd114f3	Bring some clarity to the defaults for the xxx_flush_after parameters. Instead of confusingly stating platform-dependent defaults for these parameters in the comments in postgresql.conf.sample (with the main entry being a lie on Linux), teach initdb to install the correct platform-dependent value in postgresql.conf, similarly to the way we handle other platform-dependent defaults. This won't do anything for existing 9.6 installations, but since it's effectively only a documentation improvement, that seems OK. Since this requires initdb to have access to the default values, move the #define's for those to pg_config_manual.h; the original placement in bufmgr.h is unworkable because that file can't be included by frontend programs. Adjust the default value for wal_writer_flush_after so that it is 1MB regardless of XLOG_BLCKSZ, conforming to what is stated in both the SGML docs and postgresql.conf. (We could alternatively make it scale with XLOG_BLCKSZ, but I'm not sure I see the point.) Copy-edit related SGML documentation. Fabien Coelho and Tom Lane, per a gripe from Tomas Vondra. Discussion: <30ebc6e3-8358-09cf-44a8-578252938424@2ndquadrant.com>	2016-11-25 18:36:10 -05:00
Robert Haas	e343dfa42b	Remove barrier.h A new thing also called a "barrier" is proposed, but whether we decide to take that patch or not, this file seems to have outlived its usefulness. Thomas Munro	2016-11-22 20:28:24 -05:00
Tom Lane	906bfcad7b	Improve handling of "UPDATE ... SET (column_list) = row_constructor". Previously, the right-hand side of a multiple-column assignment, if it wasn't a sub-SELECT, had to be a simple parenthesized expression list, because gram.y was responsible for "bursting" the construct into independent column assignments. This had the minor defect that you couldn't write ROW (though you should be able to, since the standard says this is a row constructor), and the rather larger defect that unlike other uses of row constructors, we would not expand a "foo." item into multiple columns. Fix that by changing the RHS to be just "a_expr" in the grammar, leaving it to transformMultiAssignRef to separate the elements of a RowExpr; which it will do only after performing standard transformation of the RowExpr, so that "foo." behaves as expected. The key reason we didn't do that before was the hard-wired handling of DEFAULT tokens (SetToDefault nodes). This patch deals with that issue by allowing DEFAULT in any a_expr and having parse analysis throw an error if SetToDefault is found in an unexpected place. That's an improvement anyway since the error can be more specific than just "syntax error". The SQL standard suggests that the RHS could be any a_expr yielding a suitable row value. This patch doesn't really move the goal posts in that respect --- you're still limited to RowExpr or a sub-SELECT --- but it does fix the grammar restriction, so it provides some tangible progress towards a full implementation. And the limitation is now documented by an explicit error message rather than an unhelpful "syntax error". Discussion: <8542.1479742008@sss.pgh.pa.us>	2016-11-22 15:20:10 -05:00
Robert Haas	e8ac886c24	Support condition variables. Condition variables provide a flexible way to sleep until a cooperating process causes an arbitrary condition to become true. In simple cases, this can be accomplished with a WaitLatch/ResetLatch loop; the cooperating process can call SetLatch after performing work that might cause the condition to be satisfied, and the waiting process can recheck the condition each time. However, if the process performing the work doesn't have an easy way to identify which processes might be waiting, this doesn't work, because it can't identify which latches to set. Condition variables solve that problem by internally maintaining a list of waiters; a process that may have caused some waiter's condition to be satisfied must "signal" or "broadcast" on the condition variable. Robert Haas and Thomas Munro	2016-11-22 14:27:11 -05:00
Peter Eisentraut	67dc4ccbb2	Add pg_sequences view Like pg_tables, pg_views, and others, this view contains information about sequences in a way that is independent of the system catalog layout but more comprehensive than the information schema. To help implement the view, add a new internal function pg_sequence_last_value() to return the last value of a sequence. This is kept separate from pg_sequence_parameters() to separate querying run-time state from catalog-like information. Reviewed-by: Andreas Karlsson <andreas@proxel.se>	2016-11-18 14:59:03 -05:00
Robert Haas	b40b4dd9e1	Reserve zero as an invalid DSM handle. Previously, the handle for the control segment could not be zero, but some other DSM segment could potentially have a handle value of zero. However, that means that if someone wanted to store a dsm_handle that might or might not be valid, they would need a separate boolean to keep track of whether the associated value is legal. That's annoying, so change things so that no DSM segment can ever have a handle of 0 - or as we call it here, DSM_HANDLE_INVALID. Thomas Munro. This was submitted as part of a much larger patch to add an malloc-like allocator for dynamic shared memory, but this part seems like a good idea independently of the rest of the patch.	2016-11-15 16:33:29 -05:00
Tom Lane	ffaa44cb55	Account for catalog snapshot in PGXACT->xmin updates. The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting for whether MyPgXact->xmin could be cleared or advanced. In normal transactions this was masked by the fact that the transaction snapshot would be older, but during backend startup and certain utility commands it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had been cleared, meaning that recently-deleted rows could be pruned even though this snapshot could still see them, causing unexpected catalog lookup failures. This effect appears to be the explanation for a recent failure on buildfarm member piculet. To fix, add the CatalogSnapshot to the RegisteredSnapshots heap whenever it is valid. In the previous logic, it was possible for the CatalogSnapshot to remain valid across waits for client input, but with this change that would mean it delays advance of global xmin in cases where it did not before. To avoid possibly causing new table-bloat problems with clients that sit idle for long intervals, add code to invalidate the CatalogSnapshot before waiting for client input. (When the backend is busy, it's unlikely that the CatalogSnapshot would be the oldest snap for very long, so we don't worry about forcing early invalidation of it otherwise.) In passing, remove the CatalogSnapshotStale flag in favor of using "CatalogSnapshot != NULL" to represent validity, as we do for the other special snapshots in snapmgr.c. And improve some obsolete comments. No regression test because I don't know a deterministic way to cause this failure. But the stress test shown in the original discussion provokes "cache lookup failed for relation 1255" within a few dozen seconds for me. Back-patch to 9.4 where MVCC catalog scans were introduced. (Note: it's quite easy to produce similar failures with the same test case in branches before 9.4. But MVCC catalog scans were supposed to fix that.) Discussion: <16447.1478818294@sss.pgh.pa.us>	2016-11-15 15:55:35 -05:00
Tom Lane	24aef33804	Cleanup of rewriter and planner handling of Query.hasRowSecurity flag. Be sure to pull up the subquery's hasRowSecurity flag when flattening a subquery in pull_up_simple_subquery(). This isn't a bug today because we don't look at the hasRowSecurity flag during planning, but it could easily be a bug tomorrow. Likewise, make rewriteRuleAction() pull up the hasRowSecurity flag when absorbing RTEs from a rule action. This isn't a bug either, for the opposite reason: the flag should never be set yet. But again, it seems like good future proofing. Add a comment explaining why rewriteTargetView() should not set hasRowSecurity when adding stuff to securityQuals. Improve some nearby comments about securityQuals processing, and document that field more completely in parsenodes.h. Patch by me, analysis by Dean Rasheed. Discussion: <CAEZATCXZ8tb2DV6f=bkhsMV6u_gRcZ0CZBw2J-qU84RxSukZog@mail.gmail.com>	2016-11-10 16:16:33 -05:00
Tom Lane	530f806524	Re-allow user_catalog_table option for materialized views. The reloptions stuff allows this option to be set on a matview. While it's questionable whether that is useful or was really intended, it does work, and we shouldn't change that in minor releases. Commit `e3e66d8a9` disabled the option since I didn't realize that it was possible for it to be set on a matview. Tweak the test to re-allow it. Discussion: <19749.1478711862@sss.pgh.pa.us>	2016-11-10 15:00:58 -05:00
Tom Lane	1833f1a1c3	Simplify code by getting rid of SPI_push, SPI_pop, SPI_restore_connection. The idea behind SPI_push was to allow transitioning back into an "unconnected" state when a SPI-using procedure calls unrelated code that might or might not invoke SPI. That sounds good, but in practice the only thing it does for us is to catch cases where a called SPI-using function forgets to call SPI_connect --- which is a highly improbable failure mode, since it would be exposed immediately by direct testing of said function. As against that, we've had multiple bugs induced by forgetting to call SPI_push/SPI_pop around code that might invoke SPI-using functions; these are much harder to catch and indeed have gone undetected for years in some cases. And we've had to band-aid around some problems of this ilk by introducing conditional push/pop pairs in some places, which really kind of defeats the purpose altogether; if we can't draw bright lines between connected and unconnected code, what's the point? Hence, get rid of SPI_push[_conditional], SPI_pop[_conditional], and the underlying state variable _SPI_curid. It turns out SPI_restore_connection can go away too, which is a nice side benefit since it was never more than a kluge. Provide no-op macros for the deleted functions so as to avoid an API break for external modules. A side effect of this removal is that SPI_palloc and allied functions no longer permit being called when unconnected; they'll throw an error instead. The apparent usefulness of the previous behavior was a mirage as well, because it was depended on by only a few places (which I fixed in preceding commits), and it posed a risk of allocations being unexpectedly long-lived if someone forgot a SPI_push call. Discussion: <20808.1478481403@sss.pgh.pa.us>	2016-11-08 17:39:57 -05:00
Tom Lane	9257f07872	Replace uses of SPI_modifytuple that intend to allocate in current context. Invent a new function heap_modify_tuple_by_cols() that is functionally equivalent to SPI_modifytuple except that it always allocates its result by simple palloc. I chose however to make the API details a bit more like heap_modify_tuple: pass a tupdesc rather than a Relation, and use bool convention for the isnull array. Use this function in place of SPI_modifytuple at all call sites where the intended behavior is to allocate in current context. (There actually are only two call sites left that depend on the old behavior, which makes me wonder if we should just drop this function rather than keep it.) This new function is easier to use than heap_modify_tuple() for purposes of replacing a single column (or, really, any fixed number of columns). There are a number of places where it would simplify the code to change over, but I resisted that temptation for the moment ... everywhere except in plpgsql's exec_assign_value(); changing that might offer some small performance benefit, so I did it. This is on the way to removing SPI_push/SPI_pop, but it seems like good code cleanup in its own right. Discussion: <9633.1478552022@sss.pgh.pa.us>	2016-11-08 15:36:44 -05:00
Tom Lane	e3e66d8a98	Band-aid fix for incorrect use of view options as StdRdOptions. We really ought to make StdRdOptions and the other decoded forms of reloptions self-identifying, but for the moment, assume that only plain relations could possibly be user_catalog_tables. Fixes problem with bogus "ON CONFLICT is not supported on table ... used as a catalog table" error when target is a view with cascade option. Discussion: <26681.1477940227@sss.pgh.pa.us>	2016-11-07 12:08:18 -05:00
Tom Lane	33cb96ba1a	Revert "Provide DLLEXPORT markers for C functions via PG_FUNCTION_INFO_V1 macro." This reverts commit `c8ead2a397`. Seems there is no way to do this that doesn't cause MSVC to give warnings, so let's just go back to the way we've been doing it. Discussion: <11843.1478358206@sss.pgh.pa.us>	2016-11-07 10:19:22 -05:00
Tom Lane	86d19d27ce	Remove duplicate macro definition. Seems to be a copy-and-pasteo. Odd that we heard no reports of compiler warnings about it. Thomas Munro	2016-11-05 11:51:46 -04:00
Tom Lane	06f5fd2f4f	pgwin32_is_junction's argument should be "const char " not "char ". We're passing const strings to it in places, and that's not an unreasonable thing to do. Per buildfarm (noted on frogmouth in particular).	2016-11-05 11:14:10 -04:00
Tom Lane	c8ead2a397	Provide DLLEXPORT markers for C functions via PG_FUNCTION_INFO_V1 macro. Second try at the change originally made in commit 8518583cd; this time with contrib updates so that manual extern declarations are also marked with PGDLLEXPORT. The release notes should point this out as a significant source-code change for extension authors, since they'll have to make similar additions to avoid trouble on Windows. Laurenz Albe, doc change by me Patch: <A737B7A37273E048B164557ADEF4A58B53962ED8@ntex2010a.host.magwien.gv.at>	2016-11-04 19:04:56 -04:00
Kevin Grittner	8c48375e5f	Implement syntax for transition tables in AFTER triggers. This is infrastructure for the complete SQL standard feature. No support is included at this point for execution nodes or PLs. The intent is to add that soon. As this patch leaves things, standard syntax can create tuplestores to contain old and/or new versions of rows affected by a statement. References to these tuplestores are in the TriggerData structure. C triggers can access the tuplestores directly, so they are usable, but they cannot yet be referenced within a SQL statement.	2016-11-04 10:49:50 -05:00
Robert Haas	f2e6a2ccf1	Add API to check if an existing exclusive lock allows cleanup. LockBufferForCleanup() acquires a cleanup lock unconditionally, and ConditionalLockBufferForCleanup() acquires a cleanup lock if it is possible to do so without waiting; this patch adds a new API, IsBufferCleanupOK(), which tests whether an exclusive lock already held happens to be a cleanup lock. This is possible because a cleanup lock simply means an exclusive lock plus the assurance any other pins on the buffer are newer than our own pin. Therefore, just as the existing functions decide that the exclusive lock that they've just taken is a cleanup lock if they observe the pin count to be 1, this new function allows us to observe that the pin count is 1 on a buffer we've already locked. This is useful in situations where a backend definitely wishes to modify the buffer and also wishes to perform cleanup operations if possible. The patch to eliminate heavyweight locking by hash indexes uses this, and it may have other applications as well. Amit Kapila, per a suggestion from me. Some comment adjustments by me as well.	2016-11-04 09:32:24 -04:00
Robert Haas	6bb9a6177d	Remove declarations for pq_putmessage_hook and pq_flush_hook. Commit `2bd9e412f9` added these in error. They were part of an earlier design for that patch and survived in the committed version only by inadvertency. Julien Rouhaud	2016-10-31 09:14:46 -04:00
Peter Eisentraut	c32fe432af	Avoid using a C++ keyword in header file per cpluspluscheck	2016-10-26 22:41:56 -04:00
Heikki Linnakangas	56f39009c5	Fix typos in comments. Vinayak Pokale	2016-10-26 11:12:31 +03:00
Magnus Hagander	56c7d8d455	Allow pg_basebackup to stream transaction log in tar mode This will write the received transaction log into a file called pg_wal.tar(.gz) next to the other tarfiles instead of writing it to base.tar. When using fetch mode, the transaction log is still written to base.tar like before, and when used against a pre-10 server, the file is named pg_xlog.tar. To do this, implement a new concept of a "walmethod", which is responsible for writing the WAL. Two implementations exist, one that writes to a plain directory (which is also used by pg_receivexlog) and one that writes to a tar file with optional compression. Reviewed by Michael Paquier	2016-10-23 15:23:11 +02:00
Robert Haas	f82ec32ac3	Rename "pg_xlog" directory to "pg_wal". "xlog" is not a particularly clear abbreviation for "write-ahead log", and it sometimes confuses users into believe that the contents of the "pg_xlog" directory are not critical data, leading to unpleasant consequences. So, rename the directory to "pg_wal". This patch modifies pg_upgrade and pg_basebackup to understand both the old and new directory layouts; the former is necessary given the purpose of the tool, while the latter merely avoids an unnecessary backward-compatibility break. We may wish to consider renaming other programs, switches, and functions which still use the old "xlog" naming to also refer to "wal". However, that's still under discussion, so let's do just this much for now. Discussion: CAB7nPqTeC-8+zux8_-4ZD46V7YPwooeFxgndfsq5Rg8ibLVm1A@mail.gmail.com Michael Paquier	2016-10-20 11:32:18 -04:00
Andres Freund	90d3da11c9	Fix a few typos in simplehash.h. Author: Erik Rijkers Discussion: <274e4c8ac545d6622735f97c1f6c354b@xs4all.nl>	2016-10-18 10:55:56 -07:00
Robert Haas	fca41acb86	Fix typo in comment. Amit Langote	2016-10-18 13:43:27 -04:00
Heikki Linnakangas	faae1c918e	Revert "Replace PostmasterRandom() with a stronger way of generating randomness." This reverts commit `9e083fd468`. That was a few bricks shy of a load: * Query cancel stopped working * Buildfarm member pademelon stopped working, because the box doesn't have /dev/urandom nor /dev/random. This clearly needs some more discussion, and a quite different patch, so revert for now.	2016-10-18 16:28:23 +03:00
Heikki Linnakangas	9e083fd468	Replace PostmasterRandom() with a stronger way of generating randomness. This adds a new routine, pg_strong_random() for generating random bytes, for use in both frontend and backend. At the moment, it's only used in the backend, but the upcoming SCRAM authentication patches need strong random numbers in libpq as well. pg_strong_random() is based on, and replaces, the existing implementation in pgcrypto. It can acquire strong random numbers from a number of sources, depending on what's available: - OpenSSL RAND_bytes(), if built with OpenSSL - On Windows, the native cryptographic functions are used - /dev/urandom - /dev/random Original patch by Magnus Hagander, with further work by Michael Paquier and me. Discussion: <CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com>	2016-10-17 11:52:50 +03:00
Andres Freund	5dfc198146	Use more efficient hashtable for execGrouping.c to speed up hash aggregation. The more efficient hashtable speeds up hash-aggregations with more than a few hundred groups significantly. Improvements of over 120% have been measured. Due to the the different hash table queries that not fully determined (e.g. GROUP BY without ORDER BY) may change their result order. The conversion is largely straight-forward, except that, due to the static element types of simplehash.h type hashes, the additional data some users store in elements (e.g. the per-group working data for hash aggregaters) is now stored in TupleHashEntryData->additional. The meaning of BuildTupleHashTable's entrysize (renamed to additionalsize) has been changed to only be about the additionally stored size. That size is only used for the initial sizing of the hash-table. Reviewed-By: Tomas Vondra Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>	2016-10-14 17:22:51 -07:00
Andres Freund	b30d3ea824	Add a macro templatized hashtable. dynahash.c hash tables aren't quite fast enough for some use-cases. There are several reasons for lacking performance: - the use of chaining for collision handling makes them cache inefficient, that's especially an issue when the tables get bigger. - as the element sizes for dynahash are only determined at runtime, offset computations are somewhat expensive - hash and element comparisons are indirect function calls, causing unnecessary pipeline stalls - it's two level structure has some benefits (somewhat natural partitioning), but increases the number of indirections to fix several of these the hash tables have to be adjusted to the individual use-case at compile-time. C unfortunately doesn't provide a good way to do compile code generation (like e.g. c++'s templates for all their weaknesses do). Thus the somewhat ugly approach taken here is to allow for code generation using a macro-templatized header file, which generates functions and types based on a prefix and other parameters. Later patches use this infrastructure to use such hash tables for tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation, ...). In queries where these use up a large fraction of the time, this has been measured to lead to performance improvements of over 100%. There are other cases where this could be useful (e.g. catcache.c). The hash table design chosen is a variant of linear open-addressing. The biggest disadvantage of simple linear addressing schemes are highly variable lookup times due to clustering, and deletions leaving a lot of tombstones around. To address these issues a variant of "robin hood" hashing is employed. Robin hood hashing optimizes chaining lengths by moving elements close to their optimal bucket ("rich" elements), out of the way if a to-be-inserted element is further away from its optimal position (i.e. it's "poor"). While that can make insertions slower, the average lookup performance is a lot better, and higher fill factors can be used in a still performant manner. To avoid tombstones - which normally solve the issue that a deleted node's presence is relevant to determine whether a lookup needs to continue looking or is done - buckets following a deleted element are shifted backwards, unless they're empty or already at their optimal position. There's further possible improvements that can be made to this implementation. Amongst others: - Use distance as a termination criteria during searches. This is generally a good idea, but I've been able to see the overhead of distance calculations in some cases. - Consider combining the 'empty' status into the hashvalue, and enforce storing the hashvalue. That could, in some cases, increase memory density and remove a few instructions. - Experiment further with the, very conservatively choosen, fillfactor. - Make maximum size of hashtable configurable, to allow storing very very large tables. That'd require 64bit hash values to be more common than now, though. - some smaller memcpy calls could be optimized to copy larger chunks But since the new implementation is already considerably faster than dynahash it seem sensible to start using it. Reviewed-By: Tomas Vondra Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>	2016-10-14 16:07:38 -07:00

1 2 3 4 5 ...

7435 commits