postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-02-27 20:00:50 -05:00

Author	SHA1	Message	Date
Robert Haas	6599c9ac33	Add an Assert() to max_parallel_workers enforcement. To prevent future bugs along the lines of the one corrected by commit `8ff518699f`, or find any that remain in the current code, add an Assert() that the difference between parallel_register_count and parallel_terminate_count is in a sane range. Kuntal Ghosh, with considerable tidying-up by me, per a suggestion from Neha Khatri. Reviewed by Tomas Vondra. Discussion: http://postgr.es/m/CAFO0U+-E8yzchwVnvn5BeRDPgX2z9vZUxQ8dxx9c0XFGBC7N1Q@mail.gmail.com	2017-04-11 13:03:44 -04:00
Fujii Masao	ff7bce1743	Add max_sync_workers_per_subscription to postgresql.conf.sample. This commit also does - add REPLICATION_SUBSCRIBERS into config_group - mark max_logical_replication_workers and max_sync_workers_per_subscription as REPLICATION_SUBSCRIBERS parameters - move those parameters into "Subscribers" section in postgresql.conf.sample Author: Masahiko Sawada, Petr Jelinek and me Reported-by: Masahiko Sawada Discussion: http://postgr.es/m/CAD21AoAonSCoa=v=87ZO3vhfUZA1k_E2XRNHTt=xioWGUa+0ug@mail.gmail.com	2017-04-12 00:10:54 +09:00
Magnus Hagander	a4777f3556	Remove symbol WIN32_ONLY_COMPILER This used to mean "Visual C++ except in those parts where Borland C++ was supported where it meant one of those". Now that we don't support Borland C++ anymore, simplify by using _MSC_VER which is the normal way to detect Visual C++.	2017-04-11 15:22:21 +02:00
Magnus Hagander	6da56f3f84	Remove support for bcc and msvc standalone libpq builds This removes the support for building just libpq using Borland C++ or Visual C++. This has not worked properly for years, and given the number of complaints it's clearly not worth the maintenance burden. Building libpq using the standard MSVC build system is of course still supported, along with mingw.	2017-04-11 15:22:21 +02:00
Tom Lane	8f0530f580	Improve castNode notation by introducing list-extraction-specific variants. This extends the castNode() notation introduced by commit `5bcab1114` to provide, in one step, extraction of a list cell's pointer and coercion to a concrete node type. For example, "lfirst_node(Foo, lc)" is the same as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode that have appeared so far include a list extraction call, so this is pretty widely useful, and it saves a few more keystrokes compared to the old way. As with the previous patch, back-patch the addition of these macros to pg_list.h, so that the notation will be available when back-patching. Patch by me, after an idea of Andrew Gierth's. Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us	2017-04-10 13:51:53 -04:00
Peter Eisentraut	26ad194cb0	Support configuration reload in logical replication workers Author: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Fujii Masao <masao.fujii@gmail.com>	2017-04-10 13:42:21 -04:00
Robert Haas	c0a8ae7be3	Fix reporting of violations in ExecConstraints, again. We decided in `f1b4c771ea` to pass the original slot to ExecConstraints(), but that breaks when there are BEFORE ROW triggers involved. So we need to do reverse-map the tuples back to the original descriptor instead, as Amit originally proposed. Amit Langote, reviewed by Ashutosh Bapat. One overlooked comment fixed by me. Discussion: http://postgr.es/m/b3a17254-6849-e542-2353-bde4e880b6a4@lab.ntt.co.jp	2017-04-10 12:20:08 -04:00
Tom Lane	511540dadf	Move isolationtester's is-blocked query into C code for speed. Commit `4deb41381` modified isolationtester's query to see whether a session is blocked to also check for waits occurring in GetSafeSnapshot. However, it did that in a way that enormously increased the query's runtime under CLOBBER_CACHE_ALWAYS, causing the buildfarm members that use that to run about four times slower than before, and in some cases fail entirely. To fix, push the entire logic into a dedicated backend function. This should actually reduce the CLOBBER_CACHE_ALWAYS runtime from what it was previously, though I've not checked that. In passing, expose a SQL function to check for safe-snapshot blockage, comparable to pg_blocking_pids. This is more or less free given the infrastructure built to solve the other problem, so we might as well. Thomas Munro Discussion: https://postgr.es/m/20170407165749.pstcakbc637opkax@alap3.anarazel.de	2017-04-10 10:26:54 -04:00
Kevin Grittner	c63172d60f	Add GUCs for predicate lock promotion thresholds. Defaults match the fixed behavior of prior releases, but now DBAs have better options to tune serializable workloads. It might be nice to be able to set this per relation, but that part will need to wait for another release. Author: Dagfinn Ilmari Mannsåker	2017-04-07 21:38:05 -05:00
Tom Lane	9c7f5229ad	Optimize joins when the inner relation can be proven unique. If there can certainly be no more than one matching inner row for a given outer row, then the executor can move on to the next outer row as soon as it's found one match; there's no need to continue scanning the inner relation for this outer row. This saves useless scanning in nestloop and hash joins. In merge joins, it offers the opportunity to skip mark/restore processing, because we know we have not advanced past the first possible match for the next outer row. Of course, the devil is in the details: the proof of uniqueness must depend only on joinquals (not otherquals), and if we want to skip mergejoin mark/restore then it must depend only on merge clauses. To avoid adding more planning overhead than absolutely necessary, the present patch errs in the conservative direction: there are cases where inner_unique or skip_mark_restore processing could be used, but it will not do so because it's not sure that the uniqueness proof depended only on "safe" clauses. This could be improved later. David Rowley, reviewed and rather heavily editorialized on by me Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com	2017-04-07 22:20:13 -04:00
Andres Freund	f13a9121f9	Fix issues in `e8fdbd58fe`. When the 64bit atomics simulation is in use, we can't necessarily guarantee the correct alignment of the atomics due to lack of compiler support for doing so- that's fine from a safety perspective, because everything is protected by a lock, but we asserted the alignment in all cases. Weaken them. Per complaint from Alvaro Herrera. My #ifdefery for PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY wasn't sufficient. Fix that. Per complaint from Alexander Korotkov.	2017-04-07 17:09:03 -07:00
Alvaro Herrera	8bf74967da	Reduce the number of pallocs() in BRIN Instead of allocating memory in brin_deform_tuple and brin_copy_tuple over and over during a scan, allow reuse of previously allocated memory. This is said to make for a measurable performance improvement. Author: Jinyu Zhang, Álvaro Herrera Reviewed by: Tomas Vondra Discussion: https://postgr.es/m/495deb78.4186.1500dacaa63.Coremail.beijing_pg@163.com	2017-04-07 19:08:43 -03:00
Andres Freund	e8fdbd58fe	Improve 64bit atomics support. When adding atomics back in `b64d92f1a`, I added 64bit support as optional; there wasn't yet a direct user in sight. That turned out to be a bit short-sighted, it'd already have been useful a number of times. Add a fallback implementation of 64bit atomics, just like the one we have for 32bit atomics. Additionally optimize reads/writes to 64bit on a number of platforms where aligned writes of that size are atomic. This can now be tested with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY. Author: Andres Freund Reviewed-By: Amit Kapila Discussion: https://postgr.es/m/20160330230914.GH13305@awork2.anarazel.de	2017-04-07 14:48:11 -07:00
Peter Eisentraut	0cb2e51992	Avoid using a C++ keyword in header file per cpluspluscheck	2017-04-07 16:32:02 -04:00
Alvaro Herrera	817cb10013	Fix new BRIN desummarize WAL record The WAL-writing piece was forgetting to set the pages-per-range value. Also, fix the declared type of struct member heapBlk, which I mistakenly set as OffsetNumber rather than BlockNumber. Problem was introduced by commit `c655899ba9` (April 1st). Any system that tries to replay the new WAL record written before this fix is likely to die on replay and require pg_resetwal. Reported by Tom Lane. Discussion: https://postgr.es/m/20191.1491524824@sss.pgh.pa.us	2017-04-07 17:11:56 -03:00
Robert Haas	d4116a7719	Add ProcArrayGroupUpdate wait event. Discussion: http://postgr.es/m/CA+TgmobgWHcXDcChX2+BqJDk2dkPVF85ZrJFhUyHHQmw8diTpA@mail.gmail.com	2017-04-07 13:41:47 -04:00
Heikki Linnakangas	60f11b87a2	Use SASLprep to normalize passwords for SCRAM authentication. An important step of SASLprep normalization, is to convert the string to Unicode normalization form NFKC. Unicode normalization requires a fairly large table of character decompositions, which is generated from data published by the Unicode consortium. The script to generate the table is put in src/common/unicode, as well test code for the normalization. A pre-generated version of the tables is included in src/include/common, so you don't need the code in src/common/unicode to build PostgreSQL, only if you wish to modify the normalization tables. The SASLprep implementation depends on the UTF-8 functions from src/backend/utils/mb/wchar.c. So to use it, you must also compile and link that. That doesn't change anything for the current users of these functions, the backend and libpq, as they both already link with wchar.o. It would be good to move those functions into a separate file in src/commmon, but I'll leave that for another day. No documentation changes included, because there is no details on the SCRAM mechanism in the docs anyway. An overview on that in the protocol specification would probably be good, even though SCRAM is documented in detail in RFC5802. I'll write that as a separate patch. An important thing to mention there is that we apply SASLprep even on invalid UTF-8 strings, to support other encodings. Patch by Michael Paquier and me. Discussion: https://www.postgresql.org/message-id/CAB7nPqSByyEmAVLtEf1KxTRh=PWNKiWKEKQR=e1yGehz=wbymQ@mail.gmail.com	2017-04-07 14:56:05 +03:00
Simon Riggs	ac2b095088	Reset API of clause_selectivity() Discussion: https://postgr.es/m/CAKJS1f9yurJQW9pdnzL+rmOtsp2vOytkpXKGnMFJEO-qz5O5eA@mail.gmail.com	2017-04-06 19:10:51 -04:00
Andres Freund	fa117ee403	Allow avoiding tuple copy within tuplesort_gettupleslot(). Add a "copy" argument to make it optional to receive a copy of caller tuple that is safe to use following a subsequent manipulating of tuplesort's state. This is a performance optimization. Most existing tuplesort_gettupleslot() callers are made to opt out of copying. Existing callers that happen to rely on the validity of tuple memory beyond subsequent manipulations of the tuplesort request their own copy. This brings tuplesort_gettupleslot() in line with tuplestore_gettupleslot(). In the future, a "copy" tuplesort_getdatum() argument may be added, that similarly allows callers to opt out of receiving their own copy of tuple. In passing, clarify assumptions that callers of other tuplesort fetch routines may make about tuple memory validity, per gripe from Tom Lane. Author: Peter Geoghegan Discussion: CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com	2017-04-06 14:48:59 -07:00
Alvaro Herrera	7e534adcdc	Fix BRIN cost estimation The original code was overly optimistic about the cost of scanning a BRIN index, leading to BRIN indexes being selected when they'd be a worse choice than some other index. This complete rewrite should be more accurate. Author: David Rowley, based on an earlier patch by Emre Hasegeli Reviewed-by: Emre Hasegeli Discussion: https://postgr.es/m/CAKJS1f9n-Wapop5Xz1dtGdpdqmzeGqQK4sV2MK-zZugfC14Xtw@mail.gmail.com	2017-04-06 17:51:53 -03:00
Peter Eisentraut	5f21f5292c	Mark immutable functions in information schema as parallel safe Also add opr_sanity check that all preloaded immutable functions are parallel safe. (Per discussion, this does not necessarily have to be true for all possible such functions, but deviations would be unlikely enough that maintaining such a test is reasonable.) Reported-by: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2017-04-06 14:30:13 -04:00
Peter Eisentraut	e6c9a5a9bc	Fix mixup of bool and ternary value Not currently a problem, but could be with stricter bool behavior under stdbool or C++. Reviewed-by: Andres Freund <andres@anarazel.de>	2017-04-06 13:09:42 -04:00
Alvaro Herrera	b1fc51a36e	Comment fixes for extended statistics Clean up some code comments in new extended statistics code, from `7b504eb282`.	2017-04-06 12:28:50 -03:00
Heikki Linnakangas	07044efe00	Remove bogus SCRAM_ITERATION_LEN constant. It was not used for what the comment claimed, at all. It was actually used as the 'base' argument to strtol(), when reading the iteration count. We don't need a constant for base-10, so remove it.	2017-04-06 17:41:48 +03:00
Simon Riggs	cd0cebaf7d	Always SnapshotResetXmin() during ClearTransaction() Avoid corner cases during 2PC with `6bad580d9e`	2017-04-06 10:30:22 -04:00
Peter Eisentraut	3217327053	Identity columns This is the SQL standard-conforming variant of PostgreSQL's serial columns. It fixes a few usability issues that serial columns have: - CREATE TABLE / LIKE copies default but refers to same sequence - cannot add/drop serialness with ALTER TABLE - dropping default does not drop sequence - need to grant separate privileges to sequence - other slight weirdnesses because serial is some kind of special macro Reviewed-by: Vitaly Burovoy <vitaly.burovoy@gmail.com>	2017-04-06 08:41:37 -04:00
Simon Riggs	6bad580d9e	Avoid SnapshotResetXmin() during AtEOXact_Snapshot() For normal commits and aborts we already reset PgXact->xmin, so we can simply avoid running SnapshotResetXmin() twice. During performance tests by Alexander Korotkov, diagnosis by Andres Freund showed PgXact array as a bottleneck. After manual analysis by me of the code paths that touch those memory locations, I was able to identify extraneous code in the main transaction commit path. Avoiding touching highly contented shmem improves concurrent performance slightly on all workloads, confirmed by tests run by Ashutosh Sharma and Alexander Korotkov. Simon Riggs Discussion: CANP8+jJdXE9b+b9F8CQT-LuxxO0PBCB-SZFfMVAdp+akqo4zfg@mail.gmail.com	2017-04-06 08:31:52 -04:00
Heikki Linnakangas	fd01983594	Remove dead code and fix comments in fast-path function handling. HandleFunctionRequest() is no longer responsible for reading the protocol message from the client, since commit `2b3a8b20c2`. Fix the outdated comments. HandleFunctionRequest() now always returns 0, because the code that used to return EOF was moved in `2b3a8b20c2`. Therefore, the caller no longer needs to check the return value. Reported by Andres Freund. Backpatch to all supported versions, even though this doesn't have any user-visible effect, to make backporting future patches in this area easier. Discussion: https://www.postgresql.org/message-id/20170405010525.rt5azbya5fkbhvrx@alap3.anarazel.de	2017-04-06 09:09:39 +03:00
Tom Lane	df1a699e5b	Fix integer-overflow problems in interval comparison. When using integer timestamps, the interval-comparison functions tried to compute the overall magnitude of an interval as an int64 number of microseconds. As reported by Frazer McLean, this overflows for intervals exceeding about 296000 years, which is bad since we nominally allow intervals many times larger than that. That results in wrong comparison results, and possibly in corrupted btree indexes for columns containing such large interval values. To fix, compute the magnitude as int128 instead. Although some compilers have native support for int128 calculations, many don't, so create our own support functions that can do 128-bit addition and multiplication if the compiler support isn't there. These support functions are designed with an eye to allowing the int128 code paths in numeric.c to be rewritten for use on all platforms, although this patch doesn't do that, or even provide all the int128 primitives that will be needed for it. Back-patch as far as 9.4. Earlier releases did not guard against overflow of interval values at all (commit `146604ec4` fixed that), so it seems not very exciting to worry about overly-large intervals for them. Before 9.6, we did not assume that unreferenced "static inline" functions would not draw compiler warnings, so omit functions not directly referenced by timestamp.c, the only present consumer of int128.h. (We could have omitted these functions in HEAD too, but since they were written and debugged on the way to the present patch, and they look likely to be needed by numeric.c, let's keep them in HEAD.) I did not bother to try to prevent such warnings in a --disable-integer-datetimes build, though. Before 9.5, configure will never define HAVE_INT128, so the part of int128.h that exploits a native int128 implementation is dead code in the 9.4 branch. I didn't bother to remove it, thinking that keeping the file looking similar in different branches is more useful. In HEAD only, add a simple test harness for int128.h in src/tools/. In back branches, this does not change the float-timestamps code path. That's not subject to the same kind of overflow risk, since it computes the interval magnitude as float8. (No doubt, when this code was originally written, overflow was disregarded for exactly that reason.) There is a precision hazard instead :-(, but we'll avert our eyes from that question, since no complaints have been reported and that code's deprecated anyway. Kyotaro Horiguchi and Tom Lane Discussion: https://postgr.es/m/1490104629.422698.918452336.26FA96B7@webmail.messagingengine.com	2017-04-05 23:51:27 -04:00
Simon Riggs	2686ee1b7c	Collect and use multi-column dependency stats Follow on patch in the multi-variate statistics patch series. CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t; ANALYZE; will collect dependency stats on (a, b) and then use the measured dependency in subsequent query planning. Commit `7b504eb282` added CREATE STATISTICS with n-distinct coefficients. These are now specified using the mutually exclusive option WITH (ndistinct). Author: Tomas Vondra, David Rowley Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas and many other comments and contributions Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com	2017-04-05 18:00:42 -04:00
Peter Eisentraut	afd79873a0	Capitalize names of PLs consistently Author: Daniel Gustafsson <daniel@yesql.se>	2017-04-05 00:38:25 -04:00
Peter Eisentraut	193f5f9e91	pageinspect: Add bt_page_items function with bytea argument Author: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>	2017-04-04 23:52:55 -04:00
Kevin Grittner	5ebeb579b9	Follow-on cleanup for the transition table patch. Commit `59702716` added transition table support to PL/pgsql so that SQL queries in trigger functions could access those transient tables. In order to provide the same level of support for PL/perl, PL/python and PL/tcl, refactor the relevant code into a new function SPI_register_trigger_data. Call the new function in the trigger handler of all four PLs, and document it as a public SPI function so that authors of out-of-tree PLs can do the same. Also get rid of a second QueryEnvironment object that was maintained by PL/pgsql. That was previously used to deal with cursors, but the same approach wasn't appropriate for PLs that are less tangled up with core code. Instead, have SPI_cursor_open install the connection's current QueryEnvironment, as already happens for SPI_execute_plan. While in the docs, remove the note that transition tables were only supported in C and PL/pgSQL triggers, and correct some ommissions. Thomas Munro with some work by Kevin Grittner (mostly docs)	2017-04-04 18:36:39 -05:00
Simon Riggs	9a3215026b	Make min_wal_size/max_wal_size use MB internally Previously they were defined using multiples of XLogSegSize. Remove GUC_UNIT_XSEGS. Introduce GUC_UNIT_MB Extracted from patch series on XLogSegSize infrastructure. Beena Emerson	2017-04-04 18:00:01 -04:00
Simon Riggs	728bd991c3	Speedup 2PC recovery by skipping two phase state files in normal path 2PC state info held in shmem at PREPARE, then cleaned at COMMIT PREPARED/ABORT PREPARED, avoiding writing/fsyncing any state information to disk in the normal path, greatly enhancing replay speed. Prepared transactions that live past one checkpoint redo horizon will be written to disk as now. Similar conceptually to `978b2f65aa` and building upon the infrastructure created by that commit. Authors, in equal measure: Stas Kelvich, Nikhil Sontakke and Michael Paquier Discussion: https://postgr.es/m/CAMGcDxf8Bn9ZPBBJZba9wiyQq-Qk5uqq=VjoMnRnW5s+fKST3w@mail.gmail.com	2017-04-04 15:56:56 -04:00
Robert Haas	ea69a0dead	Expand hash indexes more gradually. Since hash indexes typically have very few overflow pages, adding a new splitpoint essentially doubles the on-disk size of the index, which can lead to large and abrupt increases in disk usage (and perhaps long delays on occasion). To mitigate this problem to some degree, divide larger splitpoints into four equal phases. This means that, for example, instead of growing from 4GB to 8GB all at once, a hash index will now grow from 4GB to 5GB to 6GB to 7GB to 8GB, which is perhaps still not as smooth as we'd like but certainly an improvement. This changes the on-disk format of the metapage, so bump HASH_VERSION from 2 to 3. This will force a REINDEX of all existing hash indexes, but that's probably a good idea anyway. First, hash indexes from pre-10 versions of PostgreSQL could easily be corrupted, and we don't want to confuse corruption carried over from an older release with any corruption caused despite the new write-ahead logging in v10. Second, it will let us remove some backward-compatibility code added by commit `293e24e507`. Mithun Cy, reviewed by Amit Kapila, Jesper Pedersen and me. Regression test outputs updated by me. Discussion: http://postgr.es/m/CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoYty0jCf-pa+m+vYUJ716+AxM7nv_syvyanyf5O-L_i2A@mail.gmail.com	2017-04-03 23:46:33 -04:00
Robert Haas	7a39b5e4d1	Abstract logic to allow for multiple kinds of child rels. Currently, the only type of child relation is an "other member rel", which is the child of a baserel, but in the future joins and even upper relations may have child rels. To facilitate that, introduce macros that test to test for particular RelOptKind values, and use them in various places where they help to clarify the sense of a test. (For example, a test may allow RELOPT_OTHER_MEMBER_REL either because it intends to allow child rels, or because it intends to allow simple rels.) Also, remove find_childrel_top_parent, which will not work for a child rel that is not a baserel. Instead, add a new RelOptInfo member top_parent_relids to track the same kind of information in a more generic manner. Ashutosh Bapat, slightly tweaked by me. Review and testing of the patch set from which this was taken by Rajkumar Raghuwanshi and Rafia Sabih. Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com	2017-04-03 22:41:31 -04:00
Peter Eisentraut	9fa6e08d4a	Make header self-contained Add necessary include files for things used in the header.	2017-04-03 16:17:45 -04:00
Tom Lane	f833c847b8	Allow psql variable substitution to occur in backtick command strings. Previously, text between backquotes in a psql metacommand's arguments was always passed to the shell literally. That considerably hobbles the usefulness of the feature for scripting, so we'd foreseen for a long time that we'd someday want to allow substitution of psql variables into the shell command. IMO the addition of \if metacommands has brought us to that point, since \if can greatly benefit from some sort of client-side expression evaluation capability, and psql itself is not going to grow any such thing in time for v10. Hence, this patch. It allows :VARIABLE to be replaced by the exact contents of the named variable, while :'VARIABLE' is replaced by the variable's contents suitably quoted to become a single shell-command argument. (The quoting rules for that are different from those for SQL literals, so this is a bit of an abuse of the :'VARIABLE' notation, but I doubt anyone will be confused.) As with other situations in psql, no substitution occurs if the word following a colon is not a known variable name. That limits the risk of compatibility problems for existing psql scripts; but the risk isn't zero, so this needs to be called out in the v10 release notes. Discussion: https://postgr.es/m/9561.1490895211@sss.pgh.pa.us	2017-04-01 21:44:54 -04:00
Kevin Grittner	41bd155dd6	Fix two undocumented parameters to functions from ENR patch. On ProcessUtility document the parameter, to match others. On CreateCachedPlan drop the queryEnv parameter. It was not referenced within the function, and had been added on the assumption that with some unknown future usage of QueryEnvironment it might be useful to do something there. We have avoided other "just in case" implementation of unused paramters, so drop it here. Per gripe from Tom Lane	2017-04-01 15:21:05 -05:00
Alvaro Herrera	c655899ba9	BRIN de-summarization When the BRIN summary tuple for a page range becomes too "wide" for the values actually stored in the table (because the tuples that were present originally are no longer present due to updates or deletes), it can be useful to remove the outdated summary tuple, so that a future summarization can install a tighter summary. This commit introduces a SQL-callable interface to do so. Author: Álvaro Herrera Reviewed-by: Eiji Seki Discussion: https://postgr.es/m/20170228045643.n2ri74ara4fhhfxf@alvherre.pgsql	2017-04-01 16:10:04 -03:00
Alvaro Herrera	7526e10224	BRIN auto-summarization Previously, only VACUUM would cause a page range to get initially summarized by BRIN indexes, which for some use cases takes too much time since the inserts occur. To avoid the delay, have brininsert request a summarization run for the previous range as soon as the first tuple is inserted into the first page of the next range. Autovacuum is in charge of processing these requests, after doing all the regular vacuuming/ analyzing work on tables. This doesn't impose any new tasks on autovacuum, because autovacuum was already in charge of doing summarizations. The only actual effect is to change the timing, i.e. that it occurs earlier. For this reason, we don't go any great lengths to record these requests very robustly; if they are lost because of a server crash or restart, they will happen at a later time anyway. Most of the new code here is in autovacuum, which can now be told about "work items" to process. This can be used for other things such as GIN pending list cleaning, perhaps visibility map bit setting, both of which are currently invoked during vacuum, but do not really depend on vacuum taking place. The requests are at the page range level, a granularity for which we did not have SQL-level access; we only had index-level summarization requests via brin_summarize_new_values(). It seems reasonable to add SQL-level access to range-level summarization too, so add a function brin_summarize_range() to do that. Authors: Álvaro Herrera, based on sketch from Simon Riggs. Reviewed-by: Thomas Munro. Discussion: https://postgr.es/m/20170301045823.vneqdqkmsd4as4ds@alvherre.pgsql	2017-04-01 14:00:53 -03:00
Kevin Grittner	18ce3a4ab2	Add infrastructure to support EphemeralNamedRelation references. A QueryEnvironment concept is added, which allows new types of objects to be passed into queries from parsing on through execution. At this point, the only thing implemented is a collection of EphemeralNamedRelation objects -- relations which can be referenced by name in queries, but do not exist in the catalogs. The only type of ENR implemented is NamedTuplestore, but provision is made to add more types fairly easily. An ENR can carry its own TupleDesc or reference a relation in the catalogs by relid. Although these features can be used without SPI, convenience functions are added to SPI so that ENRs can easily be used by code run through SPI. The initial use of all this is going to be transition tables in AFTER triggers, but that will be added to each PL as a separate commit. An incidental effect of this patch is to produce a more informative error message if an attempt is made to modify the contents of a CTE from a referencing DML statement. No tests previously covered that possibility, so one is added. Kevin Grittner and Thomas Munro Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro with valuable comments and suggestions from many others	2017-03-31 23:17:18 -05:00
Robert Haas	7d8f6986b8	Fix parallel query so it doesn't spoil row estimates above Gather. Commit `45be99f8cd` removed GatherPath's num_workers field, but this is entirely bogus. Normally, a path's parallel_workers flag is supposed to indicate the number of workers that it wants, and should be 0 for a non-partial path. In that commit, I mistakenly thought that GatherPath could also use that field to indicate the number of workers that it would try to start, but that's disastrous, because then it can propagate up to higher nodes in the plan tree, which will then get incorrect rowcounts because the parallel_workers flag is involved in computing those values. Repair by putting the separate field back. Report by Tomas Vondra. Patch by me, reviewed by Amit Kapila. Discussion: http://postgr.es/m/f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com	2017-03-31 21:01:20 -04:00
Robert Haas	2113ac4cbb	Don't use bgw_main even to specify in-core bgworker entrypoints. On EXEC_BACKEND builds, this can fail if ASLR is in use. Backpatch to 9.5. On master, completely remove the bgw_main field completely, since there is no situation in which it is safe for an EXEC_BACKEND build. On 9.6 and 9.5, leave the field intact to avoid breaking things for third-party code that doesn't care about working under EXEC_BACKEND. Prior to 9.5, there are no in-core bgworker entrypoints. Petr Jelinek, reviewed by me. Discussion: http://postgr.es/m/09d8ad33-4287-a09b-a77f-77f8761adb5e@2ndquadrant.com	2017-03-31 20:43:32 -04:00
Robert Haas	c94e6942ce	Don't allocate storage for partitioned tables. Also, don't allow setting reloptions on them, since that would have no effect given the lack of storage. The patch does this by introducing a new reloption kind for which there are currently no reloptions -- we might have some in the future -- so it adjusts parseRelOptions to handle that case correctly. Bumped catversion. System catalogs that contained reloptions for partitioned tables are no longer valid; plus, there are now fewer physical files on disk, which is not technically a catalog change but still a good reason to re-initdb. Amit Langote, reviewed by Maksim Milyutin and Kyotaro Horiguchi and revised a bit by me. Discussion: http://postgr.es/m/20170331.173326.212311140.horiguchi.kyotaro@lab.ntt.co.jp	2017-03-31 16:28:51 -04:00
Andrew Dunstan	e306df7f9c	Full Text Search support for json and jsonb The new functions are ts_headline() and to_tsvector. Dmitry Dolgov, edited and documented by me.	2017-03-31 14:26:03 -04:00
Andrew Dunstan	c80b9920fc	Transform or iterate over json(b) string values Dmitry Dolgov, reviewed and lightly edited by me.	2017-03-31 14:25:25 -04:00
Simon Riggs	25fff40798	Default monitoring roles Three nologin roles with non-overlapping privs are created by default * pg_read_all_settings - read all GUCs. * pg_read_all_stats - pg_stat_, pg_database_size(), pg_tablespace_size() pg_stat_scan_tables - may lock/scan tables Top level role - pg_monitor includes all of the above by default, plus others Author: Dave Page Reviewed-by: Stephen Frost, Robert Haas, Peter Eisentraut, Simon Riggs	2017-03-30 14:18:53 -04:00
Andres Freund	5ded4bd214	Remove support for version-0 calling conventions. The V0 convention is failure prone because we've so far assumed that a function is V0 if PG_FUNCTION_INFO_V1 is missing, leading to crashes if a function was coded against the V1 interface. V0 doesn't allow proper NULL, SRF and toast handling. V0 doesn't offer features that V1 doesn't. Thus remove V0 support and obsolete fmgr README contents relating to it. Author: Andres Freund, with contributions by Peter Eisentraut & Craig Ringer Reviewed-By: Peter Eisentraut, Craig Ringer Discussion: https://postgr.es/m/20161208213441.k3mbno4twhg2qf7g@alap3.anarazel.de	2017-03-30 06:25:46 -07:00
Teodor Sigaev	f90d23d0c5	Implement SortSupport for macaddr data type Introduces a scheme to produce abbreviated keys for the macaddr type. Bump catalog version. Author: Brandur Leach Reviewed-by: Julien Rouhaud, Peter Geoghegan https://commitfest.postgresql.org/13/743/	2017-03-29 23:28:56 +03:00
Peter Eisentraut	4fdb8a82e3	Update copyright year in recently added files Author: Masahiko Sawada <sawada.mshk@gmail.com>	2017-03-29 14:54:10 -04:00
Robert Haas	9a09527164	Mark more functions parallel-restricted. Commit `61c2e1a95f` allowed parallel query to be used in more places, revealing via buildfarm member mandrill that several functions intended to be called from triggers were incorrectly marked parallel-safe rather than parallel-restricted. Report by Tom Lane. Patch by Rafia Sabih. Reviewed by me. Discussion: http://postgr.es/m/16061.1490479253@sss.pgh.pa.us	2017-03-29 10:59:27 -04:00
Peter Eisentraut	3582b223d4	Fix hardcoded typeof check result for Windows The test result that I had blindly stipulated didn't work out on the build farm, so disable the feature in Windows MSVC for now.	2017-03-29 08:55:34 -04:00
Peter Eisentraut	4cb824699e	Cast result of copyObject() to correct type copyObject() is declared to return void , which allows easily assigning the result independent of the input, but it loses all type checking. If the compiler supports typeof or something similar, cast the result to the input type. This creates a greater amount of type safety. In some cases, where the result is assigned to a generic type such as Node or Expr *, new casts are now necessary, but in general casts are now unnecessary in the normal case and indicate that something unusual is happening. Reviewed-by: Mark Dilger <hornschnorter@gmail.com>	2017-03-28 21:59:23 -04:00
Alvaro Herrera	ce96ce60ca	Remove direct uses of ItemPointer.{ip_blkid,ip_posid} There are no functional changes here; this simply encapsulates knowledge of the ItemPointerData struct so that a future patch can change things without more breakage. All direct users of ip_blkid and ip_posid are changed to use existing macros ItemPointerGetBlockNumber and ItemPointerGetOffsetNumber respectively. For callers where that's inappropriate (because they Assert that the itempointer is is valid-looking), add ItemPointerGetBlockNumberNoCheck and ItemPointerGetOffsetNumberNoCheck, which lack the assertion but are otherwise identical. Author: Pavan Deolasee Discussion: https://postgr.es/m/CABOikdNnFon4cJiL=h1mZH3bgUeU+sWHuU4Yr8AB=j3A2p1GiA@mail.gmail.com	2017-03-28 19:02:23 -03:00
Teodor Sigaev	ab89e465cb	Altering default privileges on schemas Extend ALTER DEFAULT PRIVILEGES command to schemas. Author: Matheus Oliveira Reviewed-by: Petr Jelínek, Ashutosh Sharma https://commitfest.postgresql.org/13/887/	2017-03-28 18:58:55 +03:00
Simon Riggs	ff539da316	Cleanup slots during drop database Automatically drop all logical replication slots associated with a database when the database is dropped. Previously we threw an ERROR if a slot existed. Now we throw ERROR only if a slot is active in the database being dropped. Craig Ringer	2017-03-28 10:05:21 -04:00
Alvaro Herrera	6462238f0d	Fix uninitialized memory propagation mistakes Valgrind complains that some uninitialized bytes are being passed around by the extended statistics code since commit `7b504eb282`, as reported by Andres Freund. Silence it. Tomas Vondra submitted a patch which he verified to fix the complaints in his machine; however I messed with it a bit before pushing, so any remaining problems are likely my (Álvaro's) fault. Author: Tomas Vondra Discussion: https://postgr.es/m/20170325211031.4xxoptigqxm2emn2@alap3.anarazel.de	2017-03-27 14:52:19 -03:00
Robert Haas	c4c51541e2	Still more code review for single-page hash vacuuming. Most seriously, fix use of incorrect block ID, per a report from Jeff Janes that it causes a crash and a diagnosis from Amit Kapila. Improve consistency between the hash and btree versions of this code by adding back a PANIC that btree has, and by registering data in the xlog record in the same way, per complaints from Jeff Janes and Amit Kapila. Tidy up some minor cosmetic points, per complaints from Amit Kapila. Patch by Ashutosh Sharma, reviewed by Amit Kapila, and tested by Jeff Janes. Discussion: http://postgr.es/m/CAMkU=1w-9Qe=Ff1o6bSaXpNO9wqpo7_9GL8_CVhw4BoVVHasqg@mail.gmail.com	2017-03-27 12:51:10 -04:00
Teodor Sigaev	1b02be21f2	Fsync directory after creating or unlinking file. If file was created/deleted just before powerloss it's possible that file system will miss that. To prevent it, call fsync() where creating/ unlinkg file is critical. Author: Michael Paquier Reviewed-by: Ashutosh Bapat, Takayuki Tsunakawa, me	2017-03-27 19:33:01 +03:00
Andrew Gierth	b5635948ab	Support hashed aggregation with grouping sets. This extends the Aggregate node with two new features: HashAggregate can now run multiple hashtables concurrently, and a new strategy MixedAggregate populates hashtables while doing sorted grouping. The planner will now attempt to save as many sorts as possible when planning grouping sets queries, while not exceeding work_mem for the estimated combined sizes of all hashtables used. No SQL-level changes are required. There should be no user-visible impact other than the new EXPLAIN output and possible changes to result ordering when ORDER BY was not used (which affected a few regression tests). The enable_hashagg option is respected. Author: Andrew Gierth Reviewers: Mark Dilger, Andres Freund Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk	2017-03-27 04:20:54 +01:00
Robert Haas	fc70a4b0df	Show more processes in pg_stat_activity. Previously, auxiliary processes and background workers not connected to a database (such as the logical replication launcher) weren't shown. Include them, so that we can see the associated wait state information. Add a new column to identify the processes type, so that people can filter them out easily using SQL if they wish. Before this patch was written, there was discussion about whether we should expose this information in a separate view, so as to avoid contaminating pg_stat_activity with things people might not want to see. But putting everything in pg_stat_activity was a more popular choice, so that's what the patch does. Kuntal Ghosh, reviewed by Amit Langote and Michael Paquier. Some revisions and bug fixes by me. Discussion: http://postgr.es/m/CA+TgmoYES5nhkEGw9nZXU8_FhA8XEm8NTm3-SO+3ML1B81Hkww@mail.gmail.com	2017-03-26 22:02:22 -04:00
Tom Lane	2f0903ea19	Improve performance of ExecEvalWholeRowVar. In commit `b8d7f053c`, we needed to fix ExecEvalWholeRowVar to not change the state of the slot it's copying. The initial quick hack at that required two rounds of tuple construction, which is not very nice. To fix, add another primitive to tuptoaster.c that does precisely what we need. (I initially tried to do this by refactoring one of the existing functions into two pieces; but it looked like that might hurt performance for the existing case, and the amount of code that could be shared is not very large, so I gave up on that.) Discussion: https://postgr.es/m/26088.1490315792@sss.pgh.pa.us	2017-03-26 19:14:57 -04:00
Peter Eisentraut	895f93701f	Fix cpluspluscheck warning Structure tag cannot be the same as a typedef that is a pointer to that struct.	2017-03-26 18:31:05 -04:00
Tom Lane	5459cfd3ad	Fix typos in logical replication support for initial data copy. Fix an incorrect assert condition (noted by Coverity), and spell the new name of the function correctly. Typos introduced in commit `7c4f52409`. Michael Paquier	2017-03-26 17:44:35 -04:00
Andres Freund	b8d7f053c5	Faster expression evaluation and targetlist projection. This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de	2017-03-25 14:52:06 -07:00
Simon Riggs	5737c12df0	Report catalog_xmin separately in hot_standby_feedback If the upstream walsender is using a physical replication slot, store the catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a slot and has only a PGPROC entry behaviour doesn't change, as we store the combined xmin and catalog_xmin in the PGPROC entry. Author: Craig Ringer	2017-03-25 14:07:27 +00:00
Peter Eisentraut	7678fe1c6d	Make header self-contained Add necessary include files for things used in the header.	2017-03-24 23:36:10 -04:00
Simon Riggs	3428ef7911	Reverting `42b4b0b241` Buildfarm issues and other reported issues	2017-03-24 17:56:17 +00:00
Alvaro Herrera	7b504eb282	Implement multivariate n-distinct coefficients Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql	2017-03-24 14:06:10 -03:00
Robert Haas	857ee8e391	Add a txid_status function. If your connection to the database server is lost while a COMMIT is in progress, it may be difficult to figure out whether the COMMIT was successful or not. This function will tell you, provided that you don't wait too long to ask. It may be useful in other situations, too. Craig Ringer, reviewed by Simon Riggs and by me Discussion: http://postgr.es/m/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com	2017-03-24 12:00:53 -04:00
Simon Riggs	42b4b0b241	Avoid SnapshotResetXmin() during AtEOXact_Snapshot() For normal commits and aborts we already reset PgXact->xmin Avoiding touching highly contented shmem improves concurrent performance. Simon Riggs Discussion: CANP8+jJdXE9b+b9F8CQT-LuxxO0PBCB-SZFfMVAdp+akqo4zfg@mail.gmail.com	2017-03-24 14:20:59 +00:00
Heikki Linnakangas	7ac955b347	Allow SCRAM authentication, when pg_hba.conf says 'md5'. If a user has a SCRAM verifier in pg_authid.rolpassword, there's no reason we cannot attempt to perform SCRAM authentication instead of MD5. The worst that can happen is that the client doesn't support SCRAM, and the authentication will fail. But previously, it would fail for sure, because we would not even try. SCRAM is strictly more secure than MD5, so there's no harm in trying it. This allows for a more graceful transition from MD5 passwords to SCRAM, as user passwords can be changed to SCRAM verifiers incrementally, without changing pg_hba.conf. Refactor the code in auth.c to support that better. Notably, we now have to look up the user's pg_authid entry before sending the password challenge, also when performing MD5 authentication. Also simplify the concept of a "doomed" authentication. Previously, if a user had a password, but it had expired, we still performed SCRAM authentication (but always returned error at the end) using the salt and iteration count from the expired password. Now we construct a fake salt, like we do when the user doesn't have a password or doesn't exist at all. That simplifies get_role_password(), and we can don't need to distinguish the "user has expired password", and "user does not exist" cases in auth.c. On second thoughts, also rename uaSASL to uaSCRAM. It refers to the mechanism specified in pg_hba.conf, and while we use SASL for SCRAM authentication at the protocol level, the mechanism should be called SCRAM, not SASL. As a comparison, we have uaLDAP, even though it looks like the plain 'password' authentication at the protocol level. Discussion: https://www.postgresql.org/message-id/6425.1489506016@sss.pgh.pa.us Reviewed-by: Michael Paquier	2017-03-24 13:32:21 +02:00
Teodor Sigaev	78874531ba	Fix backup canceling Assert-enabled build crashes but without asserts it works by wrong way: it may not reset forcing full page write and preventing from starting exclusive backup with the same name as cancelled. Patch replaces pair of booleans nonexclusive_backup_running/exclusive_backup_running to single enum to correctly describe backup state. Backpatch to 9.6 where bug was introduced Reported-by: David Steele Authors: Michael Paquier, David Steele Reviewed-by: Anastasia Lubennikova https://commitfest.postgresql.org/13/1068/	2017-03-24 13:53:40 +03:00
Tom Lane	457a444873	Avoid syntax error on platforms that have neither LOCALE_T nor ICU. Buildfarm member anole sees this union as empty, and doesn't like it.	2017-03-23 23:18:52 -04:00
Robert Haas	c23b186ae9	Fix enum definition. Commit `249cf070e3` assigned to one of the labels in the middle the value that should have been assigned to the first member of the enum. Rushabh's patch didn't have that defect as submitted, but I managed to mess it up while editing. Repair.	2017-03-23 16:10:43 -04:00
Peter Eisentraut	eccfef81e1	ICU support Add a column collprovider to pg_collation that determines which library provides the collation data. The existing choices are default and libc, and this adds an icu choice, which uses the ICU4C library. The pg_locale_t type is changed to a union that contains the provider-specific locale handles. Users of locale information are changed to look into that struct for the appropriate handle to use. Also add a collversion column that records the version of the collation when it is created, and check at run time whether it is still the same. This detects potentially incompatible library upgrades that can corrupt indexes and other structures. This is currently only supported by ICU-provided collations. initdb initializes the default collation set as before from the `locale -a` output but also adds all available ICU locales with a "-x-icu" appended. Currently, ICU-provided collations can only be explicitly named collations. The global database locales are still always libc-provided. ICU support is enabled by configure --with-icu. Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se>	2017-03-23 15:28:48 -04:00
Robert Haas	ea42cc18c3	Track the oldest XID that can be safely looked up in CLOG. This provides infrastructure for looking up arbitrary, user-supplied XIDs without a risk of scary-looking failures from within the clog module. Normally, the oldest XID that can be safely looked up in CLOG is the same as the oldest XID that can reused without causing wraparound, and the latter is already tracked. However, while truncation is in progress, the values are different, so we must keep track of them separately. Craig Ringer, reviewed by Simon Riggs and by me. Discussion: http://postgr.es/m/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com	2017-03-23 14:26:31 -04:00
Robert Haas	691b8d5928	Allow for parallel execution whenever ExecutorRun() is done only once. Previously, it was unsafe to execute a plan in parallel if ExecutorRun() might be called with a non-zero row count. However, it's quite easy to fix things up so that we can support that case, provided that it is known that we will never call ExecutorRun() a second time for the same QueryDesc. Add infrastructure to signal this, and cross-checks to make sure that a caller who claims this is true doesn't later reneg. While that pattern never happens with queries received directly from a client -- there's no way to know whether multiple Execute messages will be sent unless the first one requests all the rows -- it's pretty common for queries originating from procedural languages, which often limit the result to a single tuple or to a user-specified number of tuples. This commit doesn't actually enable parallelism in any additional cases, because currently none of the places that would be able to benefit from this infrastructure pass CURSOR_OPT_PARALLEL_OK in the first place, but it makes it much more palatable to pass CURSOR_OPT_PARALLEL_OK in places where we currently don't, because it eliminates some cases where we'd end up having to run the parallel plan serially. Patch by me, based on some ideas from Rafia Sabih and corrected by Rafia Sabih based on feedback from Dilip Kumar and myself. Discussion: http://postgr.es/m/CA+TgmobXEhvHbJtWDuPZM9bVSLiTj-kShxQJ2uM5GPDze9fRYA@mail.gmail.com	2017-03-23 13:14:36 -04:00
Teodor Sigaev	218f51584d	Reduce page locking in GIN vacuum GIN vacuum during cleaning posting tree can lock this whole tree for a long time with by holding LockBufferForCleanup() on root. Patch changes it with two ways: first, cleanup lock will be taken only if there is an empty page (which should be deleted) and, second, it tries to lock only subtree, not the whole posting tree. Author: Andrey Borodin with minor editorization by me Reviewed-by: Jeff Davis, me https://commitfest.postgresql.org/13/896/	2017-03-23 19:38:47 +03:00
Peter Eisentraut	73561013e5	Remove trailing comma from enum definition Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-03-23 11:58:11 -04:00
Peter Eisentraut	128e6ee01d	Assorted compilation and test fixes related to `7c4f52409a`, per build farm Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-03-23 11:44:43 -04:00
Simon Riggs	6912acc04f	Replication lag tracking for walsenders Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication. Implements a lag tracker module that reports the lag times based upon measurements of the time taken for recent WAL to be written, flushed and replayed and for the sender to hear about it. These times represent the commit lag that was (or would have been) introduced by each synchronous commit level, if the remote server was configured as a synchronous standby. For an asynchronous standby, the replay_lag column approximates the delay before recent transactions became visible to queries. If the standby server has entirely caught up with the sending server and there is no more WAL activity, the most recently measured lag times will continue to be displayed for a short time and then show NULL. Physical replication lag tracking is automatic. Logical replication tracking is possible but is the responsibility of the logical decoding plugin. Tracking is a private module operating within each walsender individually, with values reported to shared memory. Module not used outside of walsender. Design and code is good enough now to commit - kudos to the author. In many ways a difficult topic, with important and subtle behaviour so this shoudl be expected to generate discussion and multiple open items: Test now! Author: Thomas Munro, following designs by Fujii Masao and Simon Riggs Review: Simon Riggs, Ian Barwick and Craig Ringer	2017-03-23 14:05:28 +00:00
Peter Eisentraut	7c4f52409a	Logical replication support for initial data copy Add functionality for a new subscription to copy the initial data in the tables and then sync with the ongoing apply process. For the copying, add a new internal COPY option to have the COPY source data provided by a callback function. The initial data copy works on the subscriber by receiving COPY data from the publisher and then providing it locally into a COPY that writes to the destination table. A WAL receiver can now execute full SQL commands. This is used here to obtain information about tables and publications. Several new options were added to CREATE and ALTER SUBSCRIPTION to control whether and when initial table syncing happens. Change pg_dump option --no-create-subscription-slots to --no-subscription-connect and use the new CREATE SUBSCRIPTION ... NOCONNECT option for that. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Tested-by: Erik Rijkers <er@xs4all.nl>	2017-03-23 08:55:37 -04:00
Stephen Frost	017e4f2588	Expose waitforarchive option through pg_stop_backup() Internally, we have supported the option to either wait for all of the WAL associated with a backup to be archived, or to return immediately. This option is useful to users of pg_stop_backup() as well, when they are reading the stop backup record position and checking that the WAL they need has been archived independently. This patch adds an additional, optional, argument to pg_stop_backup() which allows the user to indicate if they wish to wait for the WAL to be archived or not. The default matches current behavior, which is to wait. Author: David Steele, with some minor changes, doc updates by me. Reviewed by: Takayuki Tsunakawa, Fujii Masao Discussion: https://postgr.es/m/758e3fd1-45b4-5e28-75cd-e9e7f93a4c02@pgmasters.net	2017-03-22 23:44:58 -04:00
Magnus Hagander	6b76f1bb58	Support multiple RADIUS servers This changes all the RADIUS related parameters (radiusserver, radiussecret, radiusport, radiusidentifier) to be plural and to accept a comma separated list of servers, which will be tried in order. Reviewed by Adam Brightwell	2017-03-22 18:11:08 +01:00
Simon Riggs	af4b1a0869	Refactor GetOldestXmin() to use flags Replace ignoreVacuum parameter with more flexible flags. Author: Eiji Seki Review: Haribabu Kommi	2017-03-22 16:51:01 +00:00
Andrew Dunstan	96a7128b7b	Sync pg_dump and pg_dumpall output Before exiting any files are fsync'ed. A --no-sync option is also provided for a faster exit if desired. Michael Paquier. Reviewed by Albe Laurenz Discussion: https://postgr.es/m/CAB7nPqS1uZ=Ov+UruW6jr3vB-S_DLVMPc0dQpV-fTDjmm0ZQMg@mail.gmail.com	2017-03-22 10:20:13 -04:00
Simon Riggs	9b013dc238	Improve performance of replay of AccessExclusiveLocks A hot standby replica keeps a list of Access Exclusive locks for a top level transaction. These locks are released when the top level transaction ends. Searching of this list is O(N^2), and each transaction had to pay the price of searching this list for locks, even if it didn't take any AE locks itself. This patch optimizes this case by having the master server track which transactions took AE locks, and passes that along to the standby server in the commit/abort record. This allows the standby to only try to release locks for transactions which actually took any, avoiding the majority of the performance issue. Refactor MyXactAccessedTempRel into MyXactFlags to allow minimal additional cruft with this. Analysis and initial patch by David Rowley Author: David Rowley and Simon Riggs	2017-03-22 13:09:36 +00:00
Simon Riggs	1148e22a82	Teach xlogreader to follow timeline switches Uses page-based mechanism to ensure we’re using the correct timeline. Tests are included to exercise the functionality using a cold disk-level copy of the master that's started up as a replica with slots intact, but the intended use of the functionality is with later features. Craig Ringer, reviewed by Simon Riggs and Andres Freund	2017-03-22 07:05:12 +00:00
Robert Haas	d3cc37f1d8	Don't scan partitioned tables. Partitioned tables do not contain any data; only their unpartitioned descendents need to be scanned. However, the partitioned tables still need to be locked, even though they're not scanned. To make that work, Append and MergeAppend relations now need to carry a list of (unscanned) partitioned relations that must be locked, and InitPlan must lock all partitioned result relations. Aside from the obvious advantage of avoiding some work at execution time, this has two other advantages. First, it may improve the planner's decision-making in some cases since the empty relation might throw things off. Second, it paves the way to getting rid of the storage for partitioned tables altogether. Amit Langote, reviewed by me. Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp	2017-03-21 09:48:04 -04:00
Andrew Dunstan	29bf501683	Add a direct function call mechanism using the caller's context. The current DirectFunctionCall functions use NULL as the flinfo in initializing the FunctionCallInfoData for the call. That means the called function has no fn_mcxt or fn_extra to work with, and attempting to do so will result in an access violation. These functions instead use the provided flinfo, which will usually be the caller's own flinfo. The caller needs to ensure that it doesn't use the fn_extra in way that is incompatible with the way the called function will use it. The called function should not rely on anything else in the provided context, as it will be relevant to the caller, not the callee. Original code from Tom Lane. Discussion: https://postgr.es/m/db2b70a4-78d7-294a-a315-8e7f506c5978@2ndQuadrant.com	2017-03-21 08:57:46 -04:00
Andrew Dunstan	b6fb534f10	Add IF NOT EXISTS for CREATE SERVER and CREATE USER MAPPING There is still some inconsistency with the error messages surrounding foreign servers. Some use the word "foreign" and some don't. My inclination is to remove all such uses of "foreign" on the basis that the CREATE/ALTER/DROP SERVER commands don't use the word. However, that is left for another day. In this patch I have kept to the existing usage in the affected commands, which omits "foreign". Anastasia Lubennikova, reviewed by Arthur Zakirov and Ashtosh Bapat. Discussion: http://postgr.es/m/7c2ab9b8-388a-1ce0-23a3-7acf2a0ed3c6@postgrespro.ru	2017-03-20 16:40:45 -04:00
Robert Haas	953477ca35	Fixes for single-page hash index vacuum. Clear LH_PAGE_HAS_DEAD_TUPLES during replay, similar to what gets done for btree. Update hashdesc.c for xl_hash_vacuum_one_page. Oversights in commit `6977b8b7f4` spotted by Amit Kapila. Patch by Ashutosh Sharma. Bump WAL version. The original patch to make hash indexes write-ahead logged probably should have done this, and the single page vacuuming patch probably should have done it again, but better late than never. Discussion: http://postgr.es/m/CAA4eK1Kd=mJ9xreovcsh0qMiAj-QqCphHVQ_Lfau1DR9oVjASQ@mail.gmail.com	2017-03-20 15:49:09 -04:00
Tom Lane	bc18126a6b	Add configure test to see if the C compiler has gcc-style computed gotos. We'll need this for the upcoming patch to speed up expression evaluation. Might as well push it now to see if it behaves sanely in the buildfarm. Andres Freund Discussion: https://postgr.es/m/20170320062511.hp5qeurtxrwsvfxr@alap3.anarazel.de	2017-03-20 13:35:26 -04:00
Tom Lane	17f8ffa1e3	Fix REFRESH MATERIALIZED VIEW to report activity to the stats collector. The non-concurrent code path for REFRESH MATERIALIZED VIEW failed to report its updates to the stats collector. This is bad since it means auto-analyze doesn't know there's any work to be done. Adjust it to report the refresh as a table truncate followed by insertion of an appropriate number of rows. Since a matview could contain more than INT_MAX rows, change the signature of pgstat_count_heap_insert() to accept an int64 rowcount. (The accumulator it's adding into is already int64, but existing callers could not insert more than a small number of rows at once, so the argument had been declared just "int n".) This is surely a bug fix, but changing pgstat_count_heap_insert()'s API seems too risky for the back branches. Given the lack of previous complaints, I'm not sure it's a big enough problem to justify a kluge solution that would avoid that. So, no back-patch, at least for now. Jim Mlodgenski, adjusted a bit by me Discussion: https://postgr.es/m/CAB_5SRchSz7-WmdO5szdiknG8Oj_GGqJytrk1KRd11yhcMs1KQ@mail.gmail.com	2017-03-18 17:49:39 -04:00
Robert Haas	249cf070e3	Create and use wait events for read, write, and fsync operations. Previous commits, notably `53be0b1add` and `6f3bd98ebf`, made it possible to see from pg_stat_activity when a backend was stuck waiting for another backend, but it's also fairly common for a backend to be stuck waiting for an I/O. Add wait events for those operations, too. Rushabh Lathia, with further hacking by me. Reviewed and tested by Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed. Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com	2017-03-18 07:43:01 -04:00
Robert Haas	88e66d193f	Rename "pg_clog" directory to "pg_xact". Names containing the letters "log" sometimes confuse users into believing that only non-critical data is present. It is hoped this renaming will discourage ill-considered removals of transaction status data. Michael Paquier Discussion: http://postgr.es/m/CA+Tgmoa9xFQyjRZupbdEFuwUerFTvC6HjZq1ud6GYragGDFFgA@mail.gmail.com	2017-03-17 09:48:38 -04:00
Heikki Linnakangas	c6305a9c57	Allow plaintext 'password' authentication when user has a SCRAM verifier. Oversight in the main SCRAM patch.	2017-03-17 11:33:27 +02:00
Robert Haas	befd73c50f	Add pg_ls_logdir() and pg_ls_waldir() functions. These functions are intended to be used by monitoring tools, and, unlike pg_ls_dir(), access to them can be granted to non-superusers, so that those monitoring tools can observe the principle of least privilege. Dave Page, revised by me, and also reviewed a bit by Thomas Munro. Discussion: http://postgr.es/m/CA+OCxow-X=D2fWdKy+HP+vQ1LtrgbsYQ=CshzZBqyFT5jOYrFw@mail.gmail.com	2017-03-16 15:05:02 -04:00
Robert Haas	6977b8b7f4	Port single-page btree vacuum logic to hash indexes. This is advantageous for hash indexes for the same reasons it's good for btrees: it accelerates space recycling, reducing bloat. Ashutosh Sharma, reviewed by Amit Kapila and by me. A bit of additional hacking by me. Discussion: http://postgr.es/m/CAE9k0PkRSyzx8dOnokEpUi2A-RFZK72WN0h9DEMv_ut9q6bPRw@mail.gmail.com	2017-03-15 22:18:56 -04:00
Stephen Frost	c583234662	Bump catversion for MACADDR8 Pointed out by Robert.	2017-03-15 11:19:39 -04:00
Stephen Frost	c7a9fa399d	Add support for EUI-64 MAC addresses as macaddr8 This adds in support for EUI-64 MAC addresses by adding a new data type called 'macaddr8' (using our usual convention of indicating the number of bytes stored). This was largely a copy-and-paste from the macaddr data type, with appropriate adjustments for having 8 bytes instead of 6 and adding support for converting a provided EUI-48 (6 byte format) to the EUI-64 format. Conversion from EUI-48 to EUI-64 inserts FFFE as the 4th and 5th bytes but does not perform the IPv6 modified EUI-64 action of flipping the 7th bit, but we add a function to perform that specific action for the user as it may be commonly done by users who wish to calculate their IPv6 address based on their network prefix and 48-bit MAC address. Author: Haribabu Kommi, with a good bit of rework of macaddr8_in by me. Reviewed by: Vitaly Burovoy, Kuntal Ghosh Discussion: https://postgr.es/m/CAJrrPGcUi8ZH+KkK+=TctNQ+EfkeCEHtMU_yo1mvX8hsk_ghNQ@mail.gmail.com	2017-03-15 11:16:25 -04:00
Peter Eisentraut	aefeb68741	Allow referring to functions without arguments when unique In DDL commands referring to an existing function, allow omitting the argument list if the function name is unique in its schema, per SQL standard. This uses the same logic that the regproc type uses for finding functions by name only. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-03-14 23:55:19 -04:00
Peter Eisentraut	eb4da3e380	Add option to control snapshot export to CREATE_REPLICATION_SLOT We used to export snapshots unconditionally in CREATE_REPLICATION_SLOT in the replication protocol, but several upcoming patches want more control over what happens. Suppress snapshot export in pg_recvlogical, which neither needs nor can use the exported snapshot. Since snapshot exporting can fail this improves reliability. This also paves the way for allowing the creation of replication slots on standbys, which cannot export snapshots because they cannot allocate new XIDs. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-03-14 17:34:22 -04:00
Robert Haas	bb4a39637a	hash: Support WAL consistency checking. Kuntal Ghosh, reviewed by Amit Kapila and Ashutosh Sharma, with a few tweaks by me. Discussion: http://postgr.es/m/CAGz5QCJLERUn_zoO0eDv6_Y_d0o4tNTMPeR7ivTLBg4rUrJdwg@mail.gmail.com	2017-03-14 14:58:56 -04:00
Robert Haas	2609e91fcf	Fix regression in parallel planning against inheritance tables. Commit `51ee6f3160` accidentally changed the behavior around inheritance hierarchies; before, we always considered parallel paths even for very small inheritance children, because otherwise an inheritance hierarchy with even one small child wouldn't be eligible for parallelism. That exception was inadverently removed; put it back. In passing, also adjust the degree-of-parallelism comptuation for index-only scans not to consider the number of heap pages fetched. Otherwise, we'll avoid parallel index-only scans on tables that are mostly all-visible, which isn't especially logical. Robert Haas and Amit Kapila, per a report from Ashutosh Sharma. Discussion: http://postgr.es/m/CAE9k0PmgSoOHRd60SHu09aRVTHRSs8s6pmyhJKWHxWw9C_x+XA@mail.gmail.com	2017-03-14 14:33:14 -04:00
Robert Haas	c11453ce0a	hash: Add write-ahead logging support. The warning about hash indexes not being write-ahead logged and their use being discouraged has been removed. "snapshot too old" is now supported for tables with hash indexes. Most importantly, barring bugs, hash indexes will now be crash-safe and usable on standbys. This commit doesn't yet add WAL consistency checking for hash indexes, as we now have for other index types; a separate patch has been submitted to cure that lack. Amit Kapila, reviewed and slightly modified by me. The larger patch series of which this is a part has been reviewed and tested by Álvaro Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper Pedersen. Discussion: http://postgr.es/m/CAA4eK1JOBX=YU33631Qh-XivYXtPSALh514+jR8XeD7v+K3r_Q@mail.gmail.com	2017-03-14 13:27:02 -04:00
Peter Eisentraut	a47b38c9ee	Spelling fixes From: Josh Soref <jsoref@gmail.com>	2017-03-14 12:58:39 -04:00
Peter Eisentraut	f97a028d8e	Spelling fixes in code comments From: Josh Soref <jsoref@gmail.com>	2017-03-14 12:58:39 -04:00
Tom Lane	5ed6fff6b7	Make logging about multixact wraparound protection less chatty. The original messaging design, introduced in commit `068cfadf9`, seems too chatty now that some time has elapsed since the bug fix; most installations will be in good shape and don't really need a reminder about this on every postmaster start. Hence, arrange to suppress the "wraparound protections are now enabled" message during startup (specifically, during the TrimMultiXact() call). The message will still appear if protection becomes effective at some later point. Discussion: https://postgr.es/m/17211.1489189214@sss.pgh.pa.us	2017-03-14 12:47:53 -04:00
Robert Haas	87f9982034	Fix failure to mark init buffers as BM_PERMANENT. This could result in corruption of the init fork of an unlogged index if the ambuildempty routine for that index used shared buffers to create the init fork, which was true for brin, gin, gist, and hash indexes. Patch by me, based on an earlier patch by Michael Paquier, who also reviewed this one. This also incorporates an idea from Artur Zakirov. Discussion: http://postgr.es/m/CACYUyc8yccE4xfxhqxfh_Mh38j7dRFuxfaK1p6dSNAEUakxUyQ@mail.gmail.com	2017-03-14 11:51:11 -04:00
Tom Lane	895e36bb3f	Add a "void *" passthrough pointer for psqlscan.l's callback functions. The immediate motivation for this is to provide clean infrastructure for the proposed \if...\endif patch for psql; but it seems like a good thing to have even if that patch doesn't get in. Previously the callback functions could only make use of application-global state, which is a pretty severe handicap. For the moment, the pointer is only passed through to the get_variable callback function. I considered also passing it to the write_error callback, but for now let's not. Neither psql nor pgbench has a use for that, and in the case of psql we'd have to invent a separate wrapper function because we would certainly not want to change the signature of psql_error(). Discussion: https://postgr.es/m/10108.1489418309@sss.pgh.pa.us	2017-03-13 17:14:46 -04:00
Heikki Linnakangas	aeed17d000	Use radix tree for character encoding conversions. Replace the mapping tables used to convert between UTF-8 and other character encodings with new radix tree-based maps. Looking up an entry in a radix tree is much faster than a binary search in the old maps. As a bonus, the radix tree representation is also more compact, making the binaries slightly smaller. The "combined" maps work the same as before, with binary search. They are much smaller than the main tables, so it doesn't matter so much. However, the "combined" maps are now stored in the same .map files as the main tables. This seems more clear, since they're always used together, and generated from the same source files. Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages. Reviewed by Michael Paquier and Daniel Gustafsson. Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp	2017-03-13 20:46:39 +02:00
Noah Misch	9d7726c2ba	Recommend wrappers of PG_DETOAST_DATUM_PACKED(). When commit `3e23b68dac` introduced single-byte varlena headers, its fmgr.h changes presented PG_GETARG_TEXT_PP() and PG_GETARG_TEXT_P() as equals. Its postgres.h changes presented PG_DETOAST_DATUM_PACKED() and VARDATA_ANY() as the exceptional case. Now, instead, firmly recommend PG_GETARG_TEXT_PP() over PG_GETARG_TEXT_P(); likewise for other ...PP() macros. This shaves cycles and invites consistency of style.	2017-03-12 19:35:33 -04:00
Noah Misch	9e0926468a	Fix comment about length of text, bytea, etc. When commit `3e23b68dac` introduced single-byte varlena headers, it rendered this comment incomplete.	2017-03-12 19:35:30 -04:00
Robert Haas	390811750d	Revert "Use group updates when setting transaction status in clog." This reverts commit `ccce90b398`. This optimization is unsafe, at least, of rollbacks and rollbacks to savepoints, but I'm concerned there may be other problematic cases as well. Therefore, I've decided to revert this pending further investigation.	2017-03-10 14:49:56 -05:00
Andres Freund	f8f1430ae7	Enable 64 bit atomics on ARM64. Previously they were disabled due to performance concerns on 32bit arm, where 64bit atomics are often implemented via kernel traps. Author: Roman Shaposhnik Discussion: http://postgr.es/m/CA+ULb+uErkFuXUCCXWHYvnV5KnAyjGUzzRcPA-M0cgO+Hm4RSA@mail.gmail.com	2017-03-10 11:19:54 -08:00
Tom Lane	8b358b42f8	Change the relkind for partitioned tables from 'P' to 'p'. Seven of the eight other relkind codes are lower-case, so it wasn't consistent for this one to be upper-case. Fix it while we still can. Historical notes: the reason for the lone exception, i.e. sequences being 'S', is that 's' was once used for "special" relations. Also, at one time the partitioned-tables patch used both 'P' and 'p', but that got changed, leaving only a surprising choice behind. This also fixes a couple little bits of technical debt, such as type_sanity.sql not knowing that 'm' is a legal value for relkind. Discussion: https://postgr.es/m/27899.1488909319@sss.pgh.pa.us	2017-03-10 13:15:47 -05:00
Tom Lane	56018bf26e	contrib/amcheck needs RecentGlobalXmin to be PGDLLIMPORT'ified. Per buildfarm. Maybe some of the other xmin variables in snapmgr.h ought to get this too, but for the moment I'm just interested in un-breaking the buildfarm.	2017-03-09 22:55:46 -05:00
Tom Lane	9cfc4deeb9	Make CppAsString2() more visible in c.h. For some reason this standard C string-processing hack was buried in an NLS-related section of c.h. Put it beside CppAsString() so that people are more likely to find it and not be tempted to reinvent local copies, as I nearly did. And provide a more helpful comment, too.	2017-03-09 19:19:25 -05:00
Robert Haas	ccce90b398	Use group updates when setting transaction status in clog. Commit `0e141c0fbb` introduced a mechanism to reduce contention on ProcArrayLock by having a single process clear XIDs in the procArray on behalf of multiple processes, reducing the need to hand the lock around. Use a similar mechanism to reduce contention on CLogControlLock. Testing shows that this very significantly reduces the amount of time waiting for CLogControlLock on high-concurrency pgbench tests run on a large multi-socket machines; whether that translates into a TPS improvement depends on how much of that contention is simply shifted to some other lock, particularly WALWriteLock. Amit Kapila, with some cosmetic changes by me. Extensively reviewed, tested, and benchmarked over a period of about 15 months by Simon Riggs, Robert Haas, Andres Freund, Jesper Pedersen, and especially by Tomas Vondra and Dilip Kumar. Discussion: http://postgr.es/m/CAA4eK1L_snxM_JcrzEstNq9P66++F4kKFce=1r5+D1vzPofdtg@mail.gmail.com Discussion: http://postgr.es/m/CAA4eK1LyR2A+m=RBSZ6rcPEwJ=rVi1ADPSndXHZdjn56yqO6Vg@mail.gmail.com Discussion: http://postgr.es/m/91d57161-d3ea-0cc2-6066-80713e4f90d7@2ndquadrant.com	2017-03-09 17:49:01 -05:00
Robert Haas	355d3993c5	Add a Gather Merge executor node. Like Gather, we spawn multiple workers and run the same plan in each one; however, Gather Merge is used when each worker produces the same output ordering and we want to preserve that output ordering while merging together the streams of tuples from various workers. (In a way, Gather Merge is like a hybrid of Gather and MergeAppend.) This works out to a win if it saves us from having to perform an expensive Sort. In cases where only a small amount of data would need to be sorted, it may actually be faster to use a regular Gather node and then sort the results afterward, because Gather Merge sometimes needs to wait synchronously for tuples whereas a pure Gather generally doesn't. But if this avoids an expensive sort then it's a win. Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro, and Neha Sharma, and reviewed and revised by me. Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com	2017-03-09 07:49:29 -05:00
Tom Lane	d6b059ec74	Document intentional violations of header inclusion policy. Although there are good reasons for our policy of including postgres.h as the first #include in every .c file, never from .h files, there are two places where it seems expedient to violate the policy because the alternative is to modify externally-supplied .c files. (In the case of the regexp library, the idea that it's externally-supplied is kind of at odds with reality, but I haven't entirely given up hope that it will become a standalone project some day.) Add some comments to make it explicit that this is a policy violation and provide the reasoning. In passing, move #include "miscadmin.h" out of regcomp.c and into regcustom.h, which is where it should be if we're taking this reasoning seriously at all. Discussion: https://postgr.es/m/CAEepm=2zCoeq3QxVwhS5DFeUh=yU6z81pbWMgfOB8OzyiBwxzw@mail.gmail.com Discussion: https://postgr.es/m/11634.1488932128@sss.pgh.pa.us	2017-03-08 17:01:13 -05:00
Robert Haas	f35742ccb7	Support parallel bitmap heap scans. The index is scanned by a single process, but then all cooperating processes can iterate jointly over the resulting set of heap blocks. In the future, we might also want to support using a parallel bitmap index scan to set up for a parallel bitmap heap scan, but that's a job for another day. Dilip Kumar, with some corrections and cosmetic changes by me. The larger patch set of which this is a part has been reviewed and tested by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, Thomas Munro, and me. Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com	2017-03-08 12:05:43 -05:00
Alvaro Herrera	fcec6caafa	Support XMLTABLE query expression XMLTABLE is defined by the SQL/XML standard as a feature that allows turning XML-formatted data into relational form, so that it can be used as a <table primary> in the FROM clause of a query. This new construct provides significant simplicity and performance benefit for XML data processing; what in a client-side custom implementation was reported to take 20 minutes can be executed in 400ms using XMLTABLE. (The same functionality was said to take 10 seconds using nested PostgreSQL XPath function calls, and 5 seconds using XMLReader under PL/Python). The implemented syntax deviates slightly from what the standard requires. First, the standard indicates that the PASSING clause is optional and that multiple XML input documents may be given to it; we make it mandatory and accept a single document only. Second, we don't currently support a default namespace to be specified. This implementation relies on a new executor node based on a hardcoded method table. (Because the grammar is fixed, there is no extensibility in the current approach; further constructs can be implemented on top of this such as JSON_TABLE, but they require changes to core code.) Author: Pavel Stehule, Álvaro Herrera Extensively reviewed by: Craig Ringer Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com	2017-03-08 12:40:26 -03:00
Robert Haas	d9528604cc	Remove inclusion of postgres.h from a few header files. Thomas Munro, per project policy articuled by Andres Freund and Tom Lane. Discussion: http://postgr.es/m/CAEepm=2zCoeq3QxVwhS5DFeUh=yU6z81pbWMgfOB8OzyiBwxzw@mail.gmail.com	2017-03-08 08:18:12 -05:00
Robert Haas	98e6e89040	tidbitmap: Support shared iteration. When a shared iterator is used, each call to tbm_shared_iterate() returns a result that has not yet been returned to any process attached to the shared iterator. In other words, each cooperating processes gets a disjoint subset of the full result set, but all results are returned exactly once. This is infrastructure for parallel bitmap heap scan. Dilip Kumar. The larger patch set of which this is a part has been reviewed and tested by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, and Thomas Munro. Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com	2017-03-08 08:09:38 -05:00
Robert Haas	38305398cd	hash: Refactor hash index creation. The primary goal here is to move all of the related page modifications to a single section of code, in preparation for adding write-ahead logging. In passing, rename _hash_metapinit to _hash_init, since it initializes more than just the metapage. Amit Kapila. The larger patch series of which this is a part has been reviewed and tested by Álvaro Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper Pedersen.	2017-03-07 17:03:51 -05:00
Robert Haas	a71f10189d	Preparatory refactoring for parallel merge join support. Extract the logic used by hash_inner_and_outer into a separate function, get_cheapest_parallel_safe_total_inner, so that it can also be used to plan parallel merge joins. Also, add a require_parallel_safe argument to the existing function get_cheapest_path_for_pathkeys, because parallel merge join needs to find the cheapest path for a given set of pathkeys that is parallel-safe, not just the cheapest one overall. Patch by me, reviewed by Dilip Kumar. Discussion: http://postgr.es/m/CA+TgmoYOv+dFK0MWW6366dFj_xTnohQfoBDrHyB7d1oZhrgPjA@mail.gmail.com	2017-03-07 10:33:29 -05:00
Heikki Linnakangas	55acfcbffd	Fix comments in SCRAM-SHA-256 patch. Amit Kapila.	2017-03-07 15:24:27 +02:00
Heikki Linnakangas	818fd4a67d	Support SCRAM-SHA-256 authentication (RFC 5802 and 7677). This introduces a new generic SASL authentication method, similar to the GSS and SSPI methods. The server first tells the client which SASL authentication mechanism to use, and then the mechanism-specific SASL messages are exchanged in AuthenticationSASLcontinue and PasswordMessage messages. Only SCRAM-SHA-256 is supported at the moment, but this allows adding more SASL mechanisms in the future, without changing the overall protocol. Support for channel binding, aka SCRAM-SHA-256-PLUS is left for later. The SASLPrep algorithm, for pre-processing the password, is not yet implemented. That could cause trouble, if you use a password with non-ASCII characters, and a client library that does implement SASLprep. That will hopefully be added later. Authorization identities, as specified in the SCRAM-SHA-256 specification, are ignored. SET SESSION AUTHORIZATION provides more or less the same functionality, anyway. If a user doesn't exist, perform a "mock" authentication, by constructing an authentic-looking challenge on the fly. The challenge is derived from a new system-wide random value, "mock authentication nonce", which is created at initdb, and stored in the control file. We go through these motions, in order to not give away the information on whether the user exists, to unauthenticated users. Bumps PG_CONTROL_VERSION, because of the new field in control file. Patch by Michael Paquier and Heikki Linnakangas, reviewed at different stages by Robert Haas, Stephen Frost, David Steele, Aleksander Alekseev, and many others. Discussion: https://www.postgresql.org/message-id/CAB7nPqRbR3GmFYdedCAhzukfKrgBLTLtMvENOmPrVWREsZkF8g%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAB7nPqSMXU35g%3DW9X74HVeQp0uvgJxvYOuA4A-A3M%2B0wfEBv-w%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/55192AFE.6080106@iki.fi	2017-03-07 14:25:40 +02:00
Heikki Linnakangas	273c458a2b	Refactor SHA2 functions and move them to src/common/. This way both frontend and backends can use them. The functions are taken from pgcrypto, which now fetches the source files it needs from src/common/. A new interface is designed for the SHA2 functions, which allow linking to either OpenSSL or the in-core stuff taken from KAME as needed. Michael Paquier, reviewed by Robert Haas. Discussion: https://www.postgresql.org/message-id/CAB7nPqTGKuTM5jiZriHrNaQeVqp5e_iT3X4BFLWY_HyHxLvySQ%40mail.gmail.com	2017-03-07 14:23:49 +02:00
Andres Freund	d4c62a6b62	Make simplehash.h grow hashtable in additional cases. Increase the size when either the distance between actual and optimal slot grows too large, or when too many subsequent entries would have to be moved. This addresses reports that the simplehash performed, sometimes considerably, worse than dynahash. The reason turned out to be that insertions into the hashtable where, due to the use of parallel query, in effect done from another hashtable, in hash-value order. If the target hashtable, due to mis-estimation, was sized a lot smaller than the source table(s) that lead to very imbalanced tables; a lot of entries in many close-by buckets from the source tables were inserted into a single, wider, bucket on the target table. As the growth factor was solely computed based on the fillfactor, the performance of the table decreased further and further. `b81b5a96f4` was an attempt to address this problem for hash aggregates (but not for bitmap scans), but it turns out that the current method of mixing hash values often actually leaves neighboring hash-values close to each other, just in different value range. It might be worth revisiting that independently of the performance issues addressed in this patch.. To address that problem resize tables in two additional cases: Firstly when the optimal position for an entry would be far from the actual position, secondly when many entries would have to be moved to make space for the new entry (while satisfying the robin hood property). Due to the additional resizing threshold it seems possible, and testing confirms that so far, that a higher fillfactor doesn't hurt performance and saves a bit of memory. It seems better to increase it now, before a release containing any of this code, rather than wonder in some later release. The various boundaries aren't determined in a particularly scientific manner, they might need some fine-tuning. In all my tests the new code now, even with parallelism, performs at least as good as the old code, in several scenarios significantly better. Reported-By: Dilip Kumar, Robert Haas, Kuntal Ghosh Discussion: https://postgr.es/m/CAFiTN-vagvuAydKG9VnWcoK=ADAhxmOa4ZTrmNsViBBooTnriQ@mail.gmail.com https://postgr.es/m/CAGz5QC+=fNTYgzMLTBUNeKt6uaWZFXJbkB5+7oWm-n9DwVxcLA@mail.gmail.com	2017-03-06 14:13:06 -08:00
Peter Eisentraut	2ca64c6f71	Replace LookupFuncNameTypeNames() with LookupFuncWithArgs() The old function took function name and function argument list as separate arguments. Now that all function signatures are passed around as ObjectWithArgs structs, this is no longer necessary and can be replaced by a function that takes ObjectWithArgs directly. Similarly for aggregates and operators. Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-03-06 13:31:47 -05:00
Peter Eisentraut	8b6d6cf853	Remove objname/objargs split for referring to objects In simpler times, it might have worked to refer to all kinds of objects by a list of name components and an optional argument list. But this doesn't work for all objects, which has resulted in a collection of hacks to place various other nodes types into these fields, which have to be unpacked at the other end. This makes it also weird to represent lists of such things in the grammar, because they would have to be lists of singleton lists, to make the unpacking work consistently. The other problem is that keeping separate name and args fields makes it awkward to deal with lists of functions. Change that by dropping the objargs field and have objname, renamed to object, be a generic Node, which can then be flexibly assigned and managed using the normal Node mechanisms. In many cases it will still be a List of names, in some cases it will be a string Value, for types it will be the existing Typename, for functions it will now use the existing ObjectWithArgs node type. Some of the more obscure object types still use somewhat arbitrary nested lists. Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-03-06 13:31:47 -05:00
Peter Eisentraut	550214a4ef	Add operator_with_argtypes grammar rule This makes the handling of operators similar to that of functions and aggregates. Rename node FuncWithArgs to ObjectWithArgs, to reflect the expanded use. Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-03-06 13:31:47 -05:00
Peter Eisentraut	63ebd377a6	Use class_args field in opclass_drop This makes it consistent with the usage in opclass_item. Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-03-06 13:31:47 -05:00
Robert Haas	9fe3c644a7	Mark pg_start_backup and pg_stop_backup as parallel-restricted. They depend on backend-private state that will not be synchronized by the parallel machinery, so they should not be marked parallel-safe. This issue also exists in 9.6, but we obviously can't do anything about 9.6 clusters that already exist. Possibly this could be back-patched so that future 9.6 clusters would come out OK, or possibly we should back-patch some other fix, but that would need more discussion. David Steele, reviewed by Michael Paquier Discussion: http://postgr.es/m/CA+TgmoYCWfO2UM-t=HUMFJyxJywLDiLL0nAJpx88LKtvBvNECw@mail.gmail.com	2017-03-06 12:41:55 -05:00
Peter Eisentraut	272adf4f9c	Disallow CREATE/DROP SUBSCRIPTION in transaction block Disallow CREATE SUBSCRIPTION and DROP SUBSCRIPTION in a transaction block when the replication slot is to be created or dropped, since that cannot be rolled back. based on patch by Masahiko Sawada <sawada.mshk@gmail.com>	2017-03-03 23:29:13 -05:00
Peter Eisentraut	6f236e1eb8	psql: Add tab completion for logical replication Add tab completion for publications and subscriptions. Also, to be able to get a list of subscriptions, make pg_subscription world-readable but revoke access to subconninfo using column privileges. From: Michael Paquier <michael.paquier@gmail.com>	2017-03-03 14:13:48 -05:00
Peter Eisentraut	1e8a850094	Use asynchronous connect API in libpqwalreceiver This makes the connection attempt from CREATE SUBSCRIPTION and from WalReceiver interruptable by the user in case the libpq connection is hanging. The previous coding required immediate shutdown (SIGQUIT) of PostgreSQL in that situation. From: Petr Jelinek <petr.jelinek@2ndquadrant.com> Tested-by: Thom Brown <thom@linux.com>	2017-03-03 09:13:58 -05:00
Robert Haas	19dc233c32	Add pg_current_logfile() function. The syslogger will write out the current stderr and csvlog names, if it's running and there are any, to a new file in the data directory called "current_logfiles". We take care to remove this file when it might no longer be valid (but not at shutdown). The function pg_current_logfile() can be used to read the entries in the file. Gilles Darold, reviewed and modified by Karl O. Pinc, Michael Paquier, and me. Further review by Álvaro Herrera and Christoph Berg.	2017-03-03 11:43:11 +05:30
Robert Haas	aea5d29836	Notify bgworker registrant after freeing worker slot. Tom Lane observed buildfarm failures caused by the select_parallel regression test trying to launch new parallel queries before the worker slots used by the previous ones were freed. Try to fix this by having the postmaster free the worker slots before it sends the SIGUSR1 notifications to the registering process. This doesn't completely eliminate the possibility that the user backend might (correctly) observe the worker as dead before the slot is free, but I believe it should make the window significantly narrower. Patch by me, per complaint from Tom Lane. Reviewed by Amit Kapila. Discussion: http://postgr.es/m/30673.1487310734@sss.pgh.pa.us	2017-03-03 09:25:30 +05:30
Robert Haas	5a73e17317	Improve error reporting for tuple-routing failures. Currently, the whole row is shown without column names. Instead, adopt a style similar to _bt_check_unique() in ExecFindPartition() and show the failing key: (key1, ...) = (val1, ...). Amit Langote, per a complaint from Simon Riggs. Reviewed by me; I also adjusted the grammar in one of the comments. Discussion: http://postgr.es/m/9f9dc7ae-14f0-4a25-5485-964d9bfc19bd@lab.ntt.co.jp	2017-03-03 09:09:52 +05:30
Andres Freund	8f7277dfb5	Fix s/ITERTOR/ITERATOR/ typo in simplehash.h. This could lead to problem when simplehash.h is used to define two different types of hashtable visible in the same translation unit. Reported-By: Josh Soref Discussion: https://postgr.es/m/CACZqfqCC7WdBAY=rQePb9-qW1rjdaTdHsV5KoVejHkDb6qrtOg@mail.gmail.com	2017-03-01 10:17:12 -08:00
Peter Eisentraut	005638e988	Fix naming inconsistency subobjid -> objsubid From: Jim Nasby <Jim.Nasby@BlueTreble.com>	2017-03-01 12:22:33 -05:00
Peter Eisentraut	20f6d74242	Collect duplicate copies of oid_cmp()	2017-03-01 11:55:28 -05:00
Peter Eisentraut	788af6f854	Move atooid() definition to a central place	2017-03-01 11:55:28 -05:00
Andres Freund	7e3aa03b41	Reduce size of common allocation header. The new slab allocator needs different per-allocation information than the classical aset.c. The definition in `58b25e981` wasn't sufficiently careful on 32 platforms with 8 byte alignment, leading to buildfarm failures. That's not entirely easy to fix by just adjusting the definition. As slab.c doesn't actually need the size part(s) of the common header, all chunks are equally sized after all, it seems better to instead reduce the header to the part needed by all allocators, namely which context an allocation belongs to. That has the advantage of reducing the overhead of slab allocations, and also allows for more flexibility in future allocators. To avoid spreading the logic about accessing a chunk's context around, centralize it in GetMemoryChunkContext(), which allows to delete a good number of lines. A followup commit will revise the mmgr/README portion about StandardChunkHeader, and more. Author: Andres Freund Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de	2017-02-28 19:42:44 -08:00
Tom Lane	9b88f27cb4	Allow index AMs to return either HeapTuple or IndexTuple format during IOS. Previously, only IndexTuple format was supported for the output data of an index-only scan. This is fine for btree, which is just returning a verbatim index tuple anyway. It's not so fine for SP-GiST, which can return reconstructed data that's much larger than a page. To fix, extend the index AM API so that index-only scan data can be returned in either HeapTuple or IndexTuple format. There's other ways we could have done it, but this way avoids an API break for index AMs that aren't concerned with the issue, and it costs little except a couple more fields in IndexScanDescs. I changed both GiST and SP-GiST to use the HeapTuple method. I'm not very clear on whether GiST can reconstruct data that's too large for an IndexTuple, but that seems possible, and it's not much of a code change to fix. Per a complaint from Vik Fearing. Reviewed by Jason Li. Discussion: https://postgr.es/m/49527f79-530d-0bfe-3dad-d183596afa92@2ndquadrant.fr	2017-02-27 17:20:34 -05:00
Robert Haas	30df93f698	hash: Refactor overflow page allocation. As with commit `b0f18cb77f`, the goal here is to move all of the related page modifications to a single section of code, in preparation for adding write-ahead logging. Amit Kapila, with slight changes by me. The larger patch series of which this is a part has been reviewed and tested by Álvaro Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper Pedersen, all of whom should also have been credited in the previous commit message.	2017-02-27 22:59:55 +05:30
Robert Haas	b0f18cb77f	hash: Refactor bucket squeeze code. In preparation for adding write-ahead logging to hash indexes, refactor _hash_freeovflpage and _hash_squeezebucket so that all related page modifications happen in a single section of code. The previous coding assumed that it would be fine to move tuples one at a time, and also that the various operations involved in freeing an overflow page didn't necessarily all need to be done together, all of which is true if you don't care about write-ahead logging. Amit Kapila, with slight changes by me.	2017-02-27 22:34:21 +05:30
Peter Eisentraut	2ed193c904	chomp PQerrorMessage() in backend uses PQerrorMessage() returns an error message with a trailing newline, but in backend use (dblink, postgres_fdw, libpqwalreceiver), we want to have the error message without that for emitting via ereport(). To simplify that, add a function pchomp() that returns a pstrdup'ed string with the trailing newline characters removed.	2017-02-27 08:54:51 -05:00
Andres Freund	9fab40ad32	Use the new "Slab" context for some allocations in reorderbuffer.h. Note that this change alone does not yet fully address the performance problems triggering this work, a large portion of the slowdown is triggered by the tuple allocator, which isn't converted to the new allocator. It would be possible to do so, but using evenly sized objects, like both the current implementation in reorderbuffer.c and slab.c, wastes a fair amount of memory. A later patch by Tomas will introduce a better approach. Author: Tomas Vondra Reviewed-By: Andres Freund Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com	2017-02-27 03:41:44 -08:00
Andres Freund	58b25e9810	Add "Slab" MemoryContext implementation for efficient equal-sized allocations. The default general purpose aset.c style memory context is not a great choice for allocations that are all going to be evenly sized, especially when those objects aren't small, and have varying lifetimes. There tends to be a lot of fragmentation, larger allocations always directly go to libc rather than have their cost amortized over several pallocs. These problems lead to the introduction of ad-hoc slab allocators in reorderbuffer.c. But it turns out that the simplistic implementation leads to problems when a lot of objects are allocated and freed, as aset.c is still the underlying implementation. Especially freeing can easily run into O(n^2) behavior in aset.c. While the O(n^2) behavior in aset.c can, and probably will, be addressed, custom allocators for this behavior are more efficient both in space and time. This allocator is for evenly sized allocations, and supports both cheap allocations and freeing, without fragmenting significantly. It does so by allocating evenly sized blocks via malloc(), and carves them into chunks that can be used for allocations. In order to release blocks to the OS as early as possible, chunks are allocated from the fullest block that still has free objects, increasing the likelihood of a block being entirely unused. A subsequent commit uses this in reorderbuffer.c, but a further allocator is needed to resolve the performance problems triggering this work. There likely are further potentialy uses of this allocator besides reorderbuffer.c. There's potential further optimizations of the new slab.c, in particular the array of freelists could be replaced by a more intelligent structure - but for now this looks more than good enough. Author: Tomas Vondra, editorialized by Andres Freund Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com	2017-02-27 03:41:44 -08:00
Andres Freund	bfd12cccbd	Make useful infrastructure from aset.c generally available. An upcoming patch introduces a new type of memory context. To avoid duplicating debugging infrastructure within aset.c, move useful pieces to memdebug.[ch]. While touching aset.c, fix printf format code in AllocFree* debug macros. Author: Tomas Vondra Reviewed-By: Andres Freund Discussion: https://postgr.es/m/b3b2245c-b37a-e1e5-ebc4-857c914bc747@2ndquadrant.com	2017-02-27 03:41:44 -08:00
Robert Haas	a315b967cc	Allow custom and foreign scans to have shutdown callbacks. This is expected to be useful mostly when performing such scans in parallel, because in that case it allows (in combination with commit `acf555bc53`) nodes below a Gather to get control just before the DSM segment goes away. KaiGai Kohei, except that I rewrote the documentation. Reviewed by Claudio Freire. Discussion: http://postgr.es/m/CADyhKSXJK0jUJ8rWv4AmKDhsUh124_rEn39eqgfC5D8fu6xVuw@mail.gmail.com	2017-02-26 13:41:12 +05:30
Tom Lane	2bd7f85796	Remove some configure header-file checks that we weren't really using. We had some AC_CHECK_HEADER tests that were really wastes of cycles, because the code proceeded to #include those headers unconditionally anyway, in all or a large majority of cases. The lack of complaints shows that those headers are available on every platform of interest, so we might as well let configure run a bit faster by not probing those headers at all. I suspect that some of the tests I left alone are equally useless, but since all the existing #includes of the remaining headers are properly guarded, I didn't touch them.	2017-02-25 18:10:09 -05:00
Tom Lane	9e3755ecb2	Remove useless duplicate inclusions of system header files. c.h #includes a number of core libc header files, such as <stdio.h>. There's no point in re-including these after having read postgres.h, postgres_fe.h, or c.h; so remove code that did so. While at it, also fix some places that were ignoring our standard pattern of "include postgres[_fe].h, then system header files, then other Postgres header files". While there's not any great magic in doing it that way rather than system headers last, it's silly to have just a few files deviating from the general pattern. (But I didn't attempt to enforce this globally, only in files I was touching anyway.) I'd be the first to say that this is mostly compulsive neatnik-ism, but over time it might save enough compile cycles to be useful.	2017-02-25 16:12:55 -05:00
Tom Lane	41c16edcf6	Fix unportable definition of BSWAP64() macro. We have a portable way of writing uint64 constants, but whoever wrote this macro didn't know about it. While at it, fix unsafe under-parenthesization of arguments. That might be moot, because there are already good reasons not to use the macro on anything more complicated than a simple variable, but it's still poor practice. Per buildfarm warnings.	2017-02-24 15:21:39 -05:00
Tom Lane	c29aff959d	Consistently declare timestamp variables as TimestampTz. Twiddle the replication-related code so that its timestamp variables are declared TimestampTz, rather than the uninformative "int64" that was previously used for meant-to-be-always-integer timestamps. This resolves the int64-vs-TimestampTz declaration inconsistencies introduced by commit `7c030783a`, though in the opposite direction to what was originally suggested. This required including datatype/timestamp.h in a couple more places than before. I decided it would be a good idea to slim down that header by not having it pull in <float.h> etc, as those headers are no longer at all relevant to its purpose. Unsurprisingly, a small number of .c files turn out to have been depending on those inclusions, so add them back in the .c files as needed. Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us	2017-02-23 15:57:08 -05:00
Tom Lane	b9d092c962	Remove now-dead code for !HAVE_INT64_TIMESTAMP. This is a basically mechanical removal of #ifdef HAVE_INT64_TIMESTAMP tests and the negative-case controlled code. Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us	2017-02-23 14:04:43 -05:00
Tom Lane	d28aafb6dd	Remove pg_control's enableIntTimes field. We don't need it any more. pg_controldata continues to report that date/time type storage is "64-bit integers", but that's now a hard-wired behavior not something it sees in the data. This avoids breaking pg_upgrade, and perhaps other utilities that inspect pg_control this way. Ditto for pg_resetwal. I chose to remove the "bigint_timestamps" output column of pg_control_init(), though, as that function hasn't been around long and probably doesn't have ossified users. Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us	2017-02-23 12:23:12 -05:00
Tom Lane	b6aa17e0ae	De-support floating-point timestamps. Per discussion, the time has come to do this. The handwriting has been on the wall at least since 9.0 that this would happen someday, whenever it got to be too much of a burden to support the float-timestamp option. The triggering factor now is the discovery that there are multiple bugs in the code that attempts to implement use of integer timestamps in the replication protocol even when the server is built for float timestamps. The internal float timestamps leak into the protocol fields in places. While we could fix the identified bugs, there's a very high risk of introducing more. Trying to build a wall that would positively prevent mixing integer and float timestamps is more complexity than we want to undertake to maintain a long-deprecated option. The fact that these bugs weren't found through testing also indicates a lack of interest in float timestamps. This commit disables configure's --disable-integer-datetimes switch (it'll still accept --enable-integer-datetimes, though), removes direct references to USE_INTEGER_DATETIMES, and removes discussion of float timestamps from the user documentation. A considerable amount of code is rendered dead by this, but removing that will occur as separate mop-up. Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us	2017-02-23 11:40:20 -05:00
Peter Eisentraut	e8d016d819	Remove deprecated COMMENT ON RULE syntax This was only used for allowing upgrades from pre-7.3 instances, which was a long time ago.	2017-02-23 08:19:52 -05:00
Robert Haas	4c728f3829	Pass the source text for a parallel query to the workers. With this change, you can see the query that a parallel worker is executing in pg_stat_activity, and if the worker crashes you can see what query it was executing when it crashed. Rafia Sabih, reviewed by Kuntal Ghosh and Amit Kapila and slightly revised by me.	2017-02-22 12:18:29 +05:30
Robert Haas	0414b26bac	Add optimizer and executor support for parallel index-only scans. Commit `5262f7a4fc` added similar support for parallel index scans; this extends that work to index-only scans. As with parallel index scans, this requires support from the index AM, so currently parallel index-only scans will only be possible for btree indexes. Rafia Sabih, reviewed and tested by Rahila Syed, Tushar Ahuja, and Amit Kapila Discussion: http://postgr.es/m/CAOGQiiPEAs4C=TBp0XShxBvnWXuzGL2u++Hm1=qnCpd6_Mf8Fw@mail.gmail.com	2017-02-19 15:57:55 +05:30
Robert Haas	16be2fd100	Make dsa_allocate interface more like MemoryContextAlloc. A new function dsa_allocate_extended now takes flags which indicate that huge allocations should be permitted, that out-of-memory conditions should not throw an error, and/or that the returned memory should be zero-filled, just like MemoryContextAllocateExtended. Commit `9acb85597f`, which added dsa_allocate0, was broken because it failed to account for the possibility that dsa_allocate() might return InvalidDsaPointer. This fixes that problem along the way. Thomas Munro, with some comment changes by me. Discussion: http://postgr.es/m/CA+Tgmobt7CcF_uQP2UQwWmu4K9qCHehMJP9_9m1urwP8hbOeHQ@mail.gmail.com	2017-02-19 13:59:53 +05:30
Peter Eisentraut	e7e4cd1ab5	Fix typo on comment	2017-02-16 23:53:01 -05:00
Robert Haas	9acb85597f	Add new function dsa_allocate0. This does the same thing as dsa_allocate, except that the memory is guaranteed to be zero-filled on return. Dilip Kumar, adjusted by me.	2017-02-16 12:57:03 -05:00
Tom Lane	93e6e40574	Formatting and docs corrections for logical decoding output plugins. Make the typedefs for output plugins consistent with project style; they were previously not even consistent with each other as to layout or inclusion of parameter names. Make the documentation look the same, and fix errors therein (missing and misdescribed parameters). Back-patch because of the documentation bugs.	2017-02-15 18:15:47 -05:00
Robert Haas	5262f7a4fc	Add optimizer and executor support for parallel index scans. In combination with `569174f1be`, which taught the btree AM how to perform parallel index scans, this allows parallel index scan plans on btree indexes. This infrastructure should be general enough to support parallel index scans for other index AMs as well, if someone updates them to support parallel scans. Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi, and me.	2017-02-15 13:53:24 -05:00
Robert Haas	51ee6f3160	Replace min_parallel_relation_size with two new GUCs. When min_parallel_relation_size was added, the only supported type of parallel scan was a parallel sequential scan, but there are pending patches for parallel index scan, parallel index-only scan, and parallel bitmap heap scan. Those patches introduce two new types of complications: first, what's relevant is not really the total size of the relation but the portion of it that we will scan; and second, index pages and heap pages shouldn't necessarily be treated in exactly the same way. Typically, the number of index pages will be quite small, but that doesn't necessarily mean that a parallel index scan can't pay off. Therefore, we introduce min_parallel_table_scan_size, which works out a degree of parallelism for scans based on the number of table pages that will be scanned (and which is therefore equivalent to min_parallel_relation_size for parallel sequential scans) and also min_parallel_index_scan_size which can be used to work out a degree of parallelism based on the number of index pages that will be scanned. Amit Kapila and Robert Haas Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com	2017-02-15 13:37:24 -05:00
Robert Haas	5d40286985	Fix wrong articles in pg_proc descriptions. This technically should involve a catversion bump, but that seems pedantic, so I skipped it. Report and patch by David Christensen.	2017-02-15 12:13:38 -05:00
Peter Eisentraut	6d16ecc646	Add CREATE COLLATION IF NOT EXISTS clause The core of the functionality was already implemented when pg_import_system_collations was added. This just exposes it as an option in the SQL command.	2017-02-15 10:01:28 -05:00
Robert Haas	569174f1be	btree: Support parallel index scans. This isn't exposed to the optimizer or the executor yet; we'll add support for those things in a separate patch. But this puts the basic mechanism in place: several processes can attach to a parallel btree index scan, and each one will get a subset of the tuples that would have been produced by a non-parallel scan. Each index page becomes the responsibility of a single worker, which then returns all of the TIDs on that page. Rahila Syed, Amit Kapila, Robert Haas, reviewed and tested by Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi.	2017-02-15 07:41:14 -05:00
Robert Haas	5e6d8d2bbb	Allow parallel workers to execute subplans. This doesn't do anything to make Param nodes anything other than parallel-restricted, so this only helps with uncorrelated subplans, and it's not necessarily very cheap because each worker will run the subplan separately (just as a Hash Join will build a separate copy of the hash table in each participating process), but it's a first step toward supporting cases that are more likely to help in practice, and is occasionally useful on its own. Amit Kapila, reviewed and tested by Rafia Sabih, Dilip Kumar, and me. Discussion: http://postgr.es/m/CAA4eK1+e8Z45D2n+rnDMDYsVEb5iW7jqaCH_tvPMYau=1Rru9w@mail.gmail.com	2017-02-14 18:16:03 -05:00
Robert Haas	8da9a22636	Split index xlog headers from other private index headers. The xlog-specific headers need to be included in both frontend code - specifically, pg_waldump - and the backend, but the remainder of the private headers for each index are only needed by the backend. By splitting the xlog stuff out into separate headers, pg_waldump pulls in fewer backend headers, which is a good thing. Patch by me, reviewed by Michael Paquier and Andres Freund, per a complaint from Dilip Kumar. Discussion: http://postgr.es/m/CA+TgmoZ=F=GkxV0YEv-A8tb+AEGy_Qa7GSiJ8deBKFATnzfEug@mail.gmail.com	2017-02-14 15:37:59 -05:00
Robert Haas	fb47544d0c	Minor fixes for WAL consistency checking. Michael Paquier, reviewed and slightly revised by me. Discussion: http://postgr.es/m/CAB7nPqRzCQb=vdfHvMtP0HMLBHU6z1aGdo4GJsUP-HP8jx+Pkw@mail.gmail.com	2017-02-14 12:41:01 -05:00
Peter Eisentraut	2ea5b06c7a	Add CREATE SEQUENCE AS <data type> clause This stores a data type, required to be an integer type, with the sequence. The sequences min and max values default to the range supported by the type, and they cannot be set to values exceeding that range. The internal implementation of the sequence is not affected. Change the serial types to create sequences of the appropriate type. This makes sure that the min and max values of the sequence for a serial column match the range of values supported by the table column. So the sequence can no longer overflow the table column. This also makes monitoring for sequence exhaustion/wraparound easier, which currently requires various contortions to cross-reference the sequences with the table columns they are used with. This commit also effectively reverts the pg_sequence column reordering in `f3b421da5f`, because the new seqtypid column allows us to fill the hole in the struct and create a more natural overall column ordering. Reviewed-by: Steve Singer <steve@ssinger.info> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-02-10 15:34:35 -05:00
Robert Haas	85c11324ca	Rename user-facing tools with "xlog" in the name to say "wal". This means pg_receivexlog because pg_receivewal, pg_resetxlog becomes pg_resetwal, and pg_xlogdump becomes pg_waldump.	2017-02-09 16:23:46 -05:00
Robert Haas	806091c96f	Remove all references to "xlog" from SQL-callable functions in pg_proc. Commit `f82ec32ac3` renamed the pg_xlog directory to pg_wal. To make things consistent, and because "xlog" is terrible terminology for either "transaction log" or "write-ahead log" rename all SQL-callable functions that contain "xlog" in the name to instead contain "wal". (Note that this may pose an upgrade hazard for some users.) Similarly, rename the xlog_position argument of the functions that create slots to be called wal_position. Discussion: https://www.postgresql.org/message-id/CA+Tgmob=YmA=H3DbW1YuOXnFVgBheRmyDkWcD9M8f=5bGWYEoQ@mail.gmail.com	2017-02-09 15:10:09 -05:00
Robert Haas	72257f9578	simplehash: Additional tweaks to make specifying an allocator work. Even if we don't emit definitions for SH_ALLOCATE and SH_FREE, we still need prototypes. The user can't define them before including simplehash.h because SH_TYPE isn't available yet. For the allocator to be able to access private_data, it needs to become an argument to SH_CREATE. Previously we relied on callers to set that after returning from SH_CREATE, but SH_CREATE calls SH_ALLOCATE before returning. Dilip Kumar, reviewed by me.	2017-02-09 14:59:57 -05:00
Tom Lane	86d911ec0f	Allow index AMs to cache data across aminsert calls within a SQL command. It's always been possible for index AMs to cache data across successive amgettuple calls within a single SQL command: the IndexScanDesc.opaque field is meant for precisely that. However, no comparable facility exists for amortizing setup work across successive aminsert calls. This patch adds such a feature and teaches GIN, GIST, and BRIN to use it to amortize catalog lookups they'd previously been doing on every call. (The other standard index AMs keep everything they need in the relcache, so there's little to improve there.) For GIN, the overall improvement in a statement that inserts many rows can be as much as 10%, though it seems a bit less for the other two. In addition, this makes a really significant difference in runtime for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated catalog lookups are vastly more expensive. The reason this has been hard up to now is that the aminsert function is not passed any useful place to cache per-statement data. What I chose to do is to add suitable fields to struct IndexInfo and pass that to aminsert. That's not widening the index AM API very much because IndexInfo is already within the ken of ambuild; in fact, by passing the same info to aminsert as to ambuild, this is really removing an inconsistency in the AM API. Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us	2017-02-09 11:52:12 -05:00
Robert Haas	a507b86900	Add WAL consistency checking facility. When the new GUC wal_consistency_checking is set to a non-empty value, it triggers recording of additional full-page images, which are compared on the standby against the results of applying the WAL record (without regard to those full-page images). Allowable differences such as hints are masked out, and the resulting pages are compared; any difference results in a FATAL error on the standby. Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki Linnakangas. Extensively reviewed and revised by Michael Paquier and by me, with additional reviews and comments from Amit Kapila, Álvaro Herrera, Simon Riggs, and Peter Eisentraut.	2017-02-08 15:45:30 -05:00
Robert Haas	c3c4f6e174	Revise the way the element allocator for a simplehash is specified. This method is more elegant and more efficient. Per a suggestion from Andres Freund, who also briefly reviewed the patch.	2017-02-07 17:10:08 -05:00
Robert Haas	ac8eb972f2	Avoid redefining simplehash_allocate/simplehash_free. There's no generic guard against multiple inclusion in this file, for good reason. But these typedefs need one, as per a report from Jeff Janes.	2017-02-07 16:20:05 -05:00
Robert Haas	565903af47	Allow the element allocator for a simplehash to be specified. This is infrastructure for a pending patch to allow parallel bitmap heap scans. Dilip Kumar, reviewed (in earlier versions) by Andres Freund and (more recently) by me. Some further renaming by me, also.	2017-02-07 16:01:44 -05:00
Robert Haas	293e24e507	Cache hash index's metapage in rel->rd_amcache. This avoids a very significant amount of buffer manager traffic and contention when scanning hash indexes, because it's no longer necessary to lock and pin the metapage for every scan. We do need some way of figuring out when the cache is too stale to use any more, so that when we lock the primary bucket page to which the cached metapage points us, we can tell whether a split has occurred since we cached the metapage data. To do that, we use the hash_prevblkno field in the primary bucket page, which would otherwise always be set to InvalidBuffer. This patch contains code so that it will continue working (although less efficiently) with hash indexes built before this change, but perhaps we should consider bumping the hash version and ripping out the compatibility code. That decision can be made later, though. Mithun Cy, reviewed by Jesper Pedersen, Amit Kapila, and by me. Before committing, I made a number of cosmetic changes to the last posted version of the patch, adjusted _hash_getcachedmetap to be more careful about order of operation, and made some necessary updates to the pageinspect documentation and regression tests.	2017-02-07 12:35:45 -05:00
Heikki Linnakangas	181bdb90ba	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:33:58 +02:00
Robert Haas	08bf6e5295	pageinspect: Support hash indexes. Patch by Jesper Pedersen and Ashutosh Sharma, with some error handling improvements by me. Tests from Peter Eisentraut. Reviewed by Álvaro Herrera, Michael Paquier, Jesper Pedersen, Jeff Janes, Peter Eisentraut, Amit Kapila, Mithun Cy, and me. Discussion: http://postgr.es/m/e2ac6c58-b93f-9dd9-f4e6-d6d30add7fdf@redhat.com	2017-02-02 14:19:32 -05:00
Andrew Dunstan	f1169ab501	Don't count background workers against a user's connection limit. Doing so doesn't seem to be within the purpose of the per user connection limits, and has particularly unfortunate effects in conjunction with parallel queries. Backpatch to 9.6 where parallel queries were introduced. David Rowley, reviewed by Robert Haas and Albe Laurenz.	2017-02-01 18:02:43 -05:00
Tom Lane	aedd554f84	Fix CatalogTupleInsert/Update abstraction for case of shared indstate. Add CatalogTupleInsertWithInfo and CatalogTupleUpdateWithInfo to let callers use the CatalogTupleXXX abstraction layer even in cases where we want to share the results of CatalogOpenIndexes across multiple inserts/updates for efficiency. This finishes the job begun in commit `2f5c9d9c9`, by allowing some remaining simple_heap_insert/update calls to be replaced. The abstraction layer is now complete enough that we don't have to export CatalogIndexInsert at all anymore. Also, this fixes several places in which `2f5c9d9c9` introduced performance regressions by using retail CatalogTupleInsert or CatalogTupleUpdate even though the previous coding had been able to amortize CatalogOpenIndexes work across multiple tuples. A possible future improvement is to arrange for the indexing.c functions to cache the CatalogIndexState somewhere, maybe in the relcache, in which case we could get rid of CatalogTupleInsertWithInfo and CatalogTupleUpdateWithInfo again. But that's a task for another day. Discussion: https://postgr.es/m/27502.1485981379@sss.pgh.pa.us	2017-02-01 17:18:36 -05:00
Tom Lane	ab02896510	Provide CatalogTupleDelete() as a wrapper around simple_heap_delete(). This extends the work done in commit `2f5c9d9c9` to provide a more nearly complete abstraction layer hiding the details of index updating for catalog changes. That commit only invented abstractions for catalog inserts and updates, leaving nearby code for catalog deletes still calling the heap-level routines directly. That seems rather ugly from here, and it does little to help if we ever want to shift to a storage system in which indexing work is needed at delete time. Hence, create a wrapper function CatalogTupleDelete(), and replace calls of simple_heap_delete() on catalog tuples with it. There are now very few direct calls of [simple_]heap_delete remaining in the tree. Discussion: https://postgr.es/m/462.1485902736@sss.pgh.pa.us	2017-02-01 16:13:30 -05:00
Heikki Linnakangas	dbd69118c0	Replace isMD5() with a more future-proof way to check if pw is encrypted. The rule is that if pg_authid.rolpassword begins with "md5" and has the right length, it's an MD5 hash, otherwise it's a plaintext password. The idiom has been to use isMD5() to check for that, but that gets awkward, when we add new kinds of verifiers, like the verifiers for SCRAM authentication in the pending SCRAM patch set. Replace isMD5() with a new get_password_type() function, so that when new verifier types are added, we don't need to remember to modify every place that currently calls isMD5(), to also recognize the new kinds of verifiers. Also, use the new plain_crypt_verify function in passwordcheck, so that it doesn't need to know about MD5, or in the future, about other kinds of hashes or password verifiers. Reviewed by Michael Paquier and Peter Eisentraut. Discussion: https://www.postgresql.org/message-id/2d07165c-1793-e243-a2a9-e45b624c7580@iki.fi	2017-02-01 13:11:37 +02:00
Alvaro Herrera	2f5c9d9c9c	Tweak catalog indexing abstraction for upcoming WARM Split the existing CatalogUpdateIndexes into two different routines, CatalogTupleInsert and CatalogTupleUpdate, which do both the heap insert/update plus the index update. This removes over 300 lines of boilerplate code all over src/backend/catalog/ and src/backend/commands. The resulting code is much more pleasing to the eye. Also, by encapsulating what happens in detail during an UPDATE, this facilitates the upcoming WARM patch, which is going to add a few more lines to the update case making the boilerplate even more boring. The original CatalogUpdateIndexes is removed; there was only one use left, and since it's just three lines, we can as well expand it in place there. We could keep it, but WARM is going to break all the UPDATE out-of-core callsites anyway, so there seems to be no benefit in doing so. Author: Pavan Deolasee Discussion: https://www.postgr.es/m/CABOikdOcFYSZ4vA2gYfs=M2cdXzXX4qGHeEiW3fu9PCfkHLa2A@mail.gmail.com	2017-01-31 18:42:24 -03:00
Tom Lane	de16ab7238	Invent pg_hba_file_rules view to show the content of pg_hba.conf. This view is designed along the same lines as pg_file_settings, to wit it shows what is currently in the file, not what the postmaster has loaded as the active settings. That allows it to be used to pre-vet edits before issuing SIGHUP. As with the earlier view, go out of our way to allow errors in the file to be reflected in the view, to assist that use-case. (We might at some point invent a view to show the current active settings, but this is not that patch; and it's not trivial to do.) Haribabu Kommi, reviewed by Ashutosh Bapat, Michael Paquier, Simon Riggs, and myself Discussion: https://postgr.es/m/CAJrrPGerH4jiwpcXT1-46QXUDmNp2QDrG9+-Tek_xC8APHShYw@mail.gmail.com	2017-01-30 18:00:26 -05:00
Stephen Frost	e54f75722c	Handle ALTER EXTENSION ADD/DROP with pg_init_privs In commit `6c268df`, pg_init_privs was added to track the initial privileges of catalog objects and extensions. Unfortunately, that commit didn't include understanding of ALTER EXTENSION ADD/DROP, which allows the objects associated with an extension to be changed after the initial CREATE EXTENSION script has been run. The result of this meant that ACLs for objects added through ALTER EXTENSION ADD were not recorded into pg_init_privs and we would end up including those ACLs in pg_dump when we shouldn't have. This commit corrects that by making sure to have pg_init_privs updated when ALTER EXTENSION ADD/DROP is run, recording the permissions as they are at ALTER EXTENSION ADD time, and removing any if/when ALTER EXTENSION DROP is called. This issue was pointed out by Moshe Jacobson as commentary on bug #14456 (which was actually a bug about versions prior to 9.6 not handling custom ACLs on extensions correctly, an issue now addressed with pg_init_privs in 9.6). Back-patch to 9.6 where pg_init_privs was introduced.	2017-01-29 23:05:07 -05:00

... 2 3 4 5 6 ...

7870 commits