postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-12 05:32:27 -04:00

Author	SHA1	Message	Date
Andrew Gierth	01bde4fa4c	Implement OR REPLACE option for CREATE AGGREGATE. Aggregates have acquired a dozen or so optional attributes in recent years for things like parallel query and moving-aggregate mode; the lack of an OR REPLACE option to add or change these for an existing agg makes extension upgrades gratuitously hard. Rectify.	2019-03-19 01:16:50 +00:00
Robert Haas	6776142a07	Revise parse tree representation for VACUUM and ANALYZE. Like commit `f41551f61f`, this aims to make it easier to add non-Boolean options to VACUUM (or, in this case, to ANALYZE). Instead of building up a bitmap of options directly in the parser, build up a list of DefElem objects and let ExecVacuum() sort it out; right now, we make no use of the fact that a DefElem can carry an associated value, but it will be easy to make that change in the future. Masahiko Sawada Discussion: http://postgr.es/m/CAD21AoATE4sn0jFFH3NcfUZXkU2BMbjBWB_kDj-XWYA-LXDcQA@mail.gmail.com	2019-03-18 15:14:52 -04:00
Robert Haas	f41551f61f	Fold vacuum's 'int options' parameter into VacuumParams. Many places need both, so this allows a few functions to take one fewer parameter. More importantly, as soon as we add a VACUUM option that takes a non-Boolean parameter, we need to replace 'int options' with a struct, and it seems better to think of adding more fields to VacuumParams rather than passing around both VacuumParams and a separate struct as well. Patch by me, reviewed by Masahiko Sawada Discussion: http://postgr.es/m/CA+Tgmob6g6-s50fyv8E8he7APfwCYYJ4z0wbZC2yZeSz=26CYQ@mail.gmail.com	2019-03-18 13:57:33 -04:00
Peter Eisentraut	1ffa59a85c	Fix optimization of foreign-key on update actions In RI_FKey_pk_upd_check_required(), we check among other things whether the old and new key are equal, so that we don't need to run cascade actions when nothing has actually changed. This was using the equality operator. But the effect of this is that if a value in the primary key is changed to one that "looks" different but compares as equal, the update is not propagated. (Examples are float -0 and 0 and case-insensitive text.) This appears to violate the SQL standard, and it also behaves inconsistently if in a multicolumn key another key is also updated that would cause the row to compare as not equal. To fix, if we are looking at the PK table in ri_KeysEqual(), then do a bytewise comparison similar to record_image_eq() instead of using the equality operators. This only makes a difference for ON UPDATE CASCADE, but for consistency we treat all changes to the PK the same. For the FK table, we continue to use the equality operators. Discussion: https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a@2ndquadrant.com	2019-03-18 17:19:21 +01:00
Michael Paquier	8b938d36f7	Refactor more code logic to update the control file `ce6afc6` has begun the refactoring work by plugging pg_rewind into a central routine to update the control file, and left around two extra copies, with one in xlog.c for the backend and one in pg_resetwal.c. By adding an extra option to the central routine in controldata_utils.c to control if a flush of the control file needs to be done, it is proving to be straight-forward to make xlog.c and pg_resetwal.c use the central code path at the condition of moving the wait event tracking there. Hence, this allows to have only one central code path to update the control file, shaving the code from the duplicates. This refactoring actually fixes a problem in pg_resetwal. Previously, the control file was first removed before being recreated. So if a crash happened between the moment the file was removed and the moment the file was created, then it would have been possible to not have a control file anymore in the database folder. Author: Fabien Coelho Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/alpine.DEB.2.21.1903170935210.2506@lancre	2019-03-18 12:59:35 +09:00
Alexander Korotkov	142c400d72	Fix make rules for jsonpath grammar making them similar to SQL grammar Reported-by: Jeff Janes, Tom Lane Discussion: https://postgr.es/m/CAMkU%3D1w1qBvoW82ZTFpAKae027R-2OHw-m6ALe0VQRNAFueBVA%40mail.gmail.com	2019-03-17 10:55:52 +03:00
Amit Kapila	f27314ff9a	Update copyright year in files added by `1bb5e78218`.	2019-03-16 16:00:38 +05:30
Alexander Korotkov	16d489b0fe	Numeric error suppression in jsonpath Add support of numeric error suppression to jsonpath as it's required by standard. This commit doesn't use PG_TRY()/PG_CATCH() in order to implement that. Instead, it provides internal versions of numeric functions used, which support error suppression. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Alexander Korotkov, Nikita Glukhov Reviewed-by: Tomas Vondra	2019-03-16 12:21:19 +03:00
Alexander Korotkov	72b6460336	Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov	2019-03-16 12:16:48 +03:00
Peter Eisentraut	893d6f8a1f	Avoid casting away a const	2019-03-16 10:13:03 +01:00
Peter Eisentraut	69039fda83	Add walreceiver API to get remote server version Add a separate walreceiver API function walrcv_server_version() to get the version of the remote server, instead of doing it as part of walrcv_identify_system(). This allows the server version to be available even for uses that don't call IDENTIFY_SYSTEM, and it seems cleaner anyway. This is for an upcoming patch, not currently used. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/20190115071359.GF1433@paquier.xyz	2019-03-15 10:16:26 +01:00
Thomas Munro	bb16aba50c	Enable parallel query with SERIALIZABLE isolation. Previously, the SERIALIZABLE isolation level prevented parallel query from being used. Allow the two features to be used together by sharing the leader's SERIALIZABLEXACT with parallel workers. An extra per-SERIALIZABLEXACT LWLock is introduced to make it safe to share, and new logic is introduced to coordinate the early release of the SERIALIZABLEXACT required for the SXACT_FLAG_RO_SAFE optimization, as follows: The first backend to observe the SXACT_FLAG_RO_SAFE flag (set by some other transaction) will 'partially release' the SERIALIZABLEXACT, meaning that the conflicts and locks it holds are released, but the SERIALIZABLEXACT itself will remain active because other backends might still have a pointer to it. Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears its own MySerializableXact variable and frees local resources so that it can skip SSI checks for the rest of the transaction. In the special case of the leader process, it transfers the SERIALIZABLEXACT to a new variable SavedSerializableXact, so that it can be completely released at the end of the transaction after all workers have exited. Remove the serializable_okay flag added to CreateParallelContext() by commit `9da0cc35`, because it's now redundant. Author: Thomas Munro Reviewed-by: Haribabu Kommi, Robert Haas, Masahiko Sawada, Kevin Grittner Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com	2019-03-15 17:47:04 +13:00
Peter Eisentraut	b13a913607	Add BKI_DEFAULT to pg_class.relrewrite This column is always 0 on disk, so it doesn't have to be tracked separately for each entry.	2019-03-14 21:25:39 +01:00
Peter Eisentraut	c6ff0b892c	Refactor ParamListInfo initialization There were six copies of identical nontrivial code. Put it into a function.	2019-03-14 13:30:09 +01:00
Tom Lane	401b87a24f	Sync commentary in transam.h and bki.sgml. Commit `a6417078c` missed updating some comments in transam.h about reservation of high OIDs for development purposes. Also tamp down an over-optimistic comment there about how easy it'd be to change FirstNormalObjectId. Earlier, commit `09568ec3d` failed to update bki.sgml for the split between genbki.pl-assigned OIDs and those assigned during initdb. Also fix genbki.pl so that it will complain if it overruns that split. It's possible that doing so would have no very bad consequences, but that's no excuse for not detecting it.	2019-03-14 00:23:40 -04:00
Thomas Munro	c6c9474aaf	Use condition variables to wait for checkpoints. Previously we used a polling/sleeping loop to wait for checkpoints to begin and end, which leads to up to a couple hundred milliseconds of needless thumb-twiddling. Use condition variables instead. Author: Thomas Munro Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGLY7sDe%2Bbg1K%3DbnEzOofGoo4bJHYh9%2BcDCXJepb6DQmLw%40mail.gmail.com	2019-03-14 10:59:33 +13:00
Tom Lane	f1d85aa98e	Add support for hyperbolic functions, as well as log10(). The SQL:2016 standard adds support for the hyperbolic functions sinh(), cosh(), and tanh(). POSIX has long required libm to provide those functions as well as their inverses asinh(), acosh(), atanh(). Hence, let's just expose the libm functions to the SQL level. As with the trig functions, we only implement versions for float8, not numeric. For the moment, we'll assume that all platforms actually do have these functions; if experience teaches otherwise, some autoconf effort may be needed. SQL:2016 also adds support for base-10 logarithm, but with the function name log10(), whereas the name we've long used is log(). Add aliases named log10() for the float8 and numeric versions. Lætitia Avrot Discussion: https://postgr.es/m/CAB_COdguG22LO=rnxDQ2DW1uzv8aQoUzyDQNJjrR4k00XSgm5w@mail.gmail.com	2019-03-12 15:55:09 -04:00
Tom Lane	3aa0395d4e	Remove remaining hard-wired OID references in the initial catalog data. In the v11-era commits that taught genbki.pl to resolve symbolic OID references in the initial catalog data, we didn't bother to make every last reference symbolic; some of the catalogs have so few initial rows that it didn't seem worthwhile. However, the new project policy that OIDs assigned by new patches should be automatically renumberable changes this calculus. A patch that wants to add a row in one of these catalogs would have a problem when the OID it assigns gets renumbered. Hence, do the mop-up work needed to make all OID references in initial data be symbolic, and establish an associated project policy that we'll never again write a hard-wired OID reference there. No catversion bump since the contents of postgres.bki aren't actually changed by this commit. Discussion: https://postgr.es/m/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-=eaochT0dd2BN9CQ@mail.gmail.com	2019-03-12 12:30:35 -04:00
Tom Lane	a6417078c4	Create a script that can renumber manually-assigned OIDs. This commit adds a Perl script renumber_oids.pl, which can reassign a range of manually-assigned OIDs to someplace else by modifying OID fields of the catalog .dat files and OID-assigning macros in the catalog .h files. Up to now, we've encouraged new patches that need manually-assigned OIDs to use OIDs just above the range of existing OIDs. Predictably, this leads to patches stepping on each others' toes, as whichever one gets committed first creates an OID conflict that other patch author(s) have to resolve manually. With the availability of renumber_oids.pl, we can eliminate a lot of this hassle. The new project policy, therefore, is: * Encourage new patches to use high OIDs (the documentation suggests choosing a block of OIDs at random in 8000..9999). * After feature freeze in each development cycle, run renumber_oids.pl to move all such OIDs down to lower numbers, thus freeing the high OID range for the next development cycle. This plan should greatly reduce the risk of OID collisions between concurrently-developed patches. Also, if such a collision happens anyway, we have the option to resolve it without much effort by doing an off-schedule OID renumbering to get the first-committed patch out of the way. Or a patch author could use renumber_oids.pl to change their patch's assignments without much pain. This approach does put a premium on not hard-wiring any OID values in places where renumber_oids.pl and genbki.pl can't fix them. Project practice in that respect seems to be pretty good already, but a follow-on patch will sand down some rough edges. John Naylor and Tom Lane, per an idea of Peter Geoghegan's Discussion: https://postgr.es/m/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-=eaochT0dd2BN9CQ@mail.gmail.com	2019-03-12 10:50:48 -04:00
Michael Paquier	ce6afc6823	Add routine able to update the control file to src/common/ This adds a new routine to src/common/ which is compatible with both the frontend and backend code, able to update the control file's contents. This is now getting used only by pg_rewind, but some upcoming patches which add more control on checksums for offline instances will make use of it. This could also get used more by the backend as xlog.c has its own flavor of the same logic with some wait events and an additional flush phase before closing the opened file descriptor, but this is let as separate work. Author: Michael Banck, Michael Paquier Reviewed-by: Fabien Coelho, Sergei Kornilov Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de	2019-03-12 10:03:33 +09:00
Andres Freund	32b8f0b033	Remove spurious return. Per buildfarm member anole. Author: Andres Freund	2019-03-11 15:04:00 -07:00
Andres Freund	c2fe139c20	tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql	2019-03-11 12:46:41 -07:00
Amit Kapila	a6e48da088	Fix typos in commit `8586bf7ed8`. Author: Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KNv1Mg2krf4E9ssWFnE=8A9mZ1VbVywXBZTFSzb+wP2g@mail.gmail.com	2019-03-11 09:58:46 -07:00
Alvaro Herrera	af38498d4c	Move hash_any prototype from access/hash.h to utils/hashutils.h ... as well as its implementation from backend/access/hash/hashfunc.c to backend/utils/hash/hashfn.c. access/hash is the place for the hash index AM, not really appropriate for generic facilities, which is what hash_any is; having things the old way meant that anything using hash_any had to include the AM's include file, pointlessly polluting its namespace with unrelated, unnecessary cruft. Also move the HTEqual strategy number to access/stratnum.h from access/hash.h. To avoid breaking third-party extension code, add an #include "utils/hashutils.h" to access/hash.h. (An easily removed line by committers who enjoy their asbestos suits to protect them from angry extension authors.) Discussion: https://postgr.es/m/201901251935.ser5e4h6djt2@alvherre.pgsql	2019-03-11 13:17:50 -03:00
Tom Lane	caf626b2cd	Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs. This change makes it possible to specify sub-millisecond delays, which work well on most modern platforms, though that was not true when the cost-delay feature was designed. To support this without breaking existing configuration entries, improve guc.c to allow floating-point GUCs to have units. Also, allow "us" (microseconds) as an input/output unit for time-unit GUCs. (It's not allowed as a base unit, at least not yet.) Likewise change the autovacuum_vacuum_cost_delay reloption to be floating-point; this forces a catversion bump because the layout of StdRdOptions changes. This patch doesn't in itself change the default values or allowed ranges for these parameters, and it should not affect the behavior for any already-allowed setting for them. Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us	2019-03-10 15:01:39 -04:00
Alexander Korotkov	f2e403803f	Support for INCLUDE attributes in GiST indexes Similarly to B-tree, GiST index access method gets support of INCLUDE attributes. These attributes aren't used for tree navigation and aren't present in non-leaf pages. But they are present in leaf pages and can be fetched during index-only scan. The point of having INCLUDE attributes in GiST indexes is slightly different from the point of having them in B-tree. The main point of INCLUDE attributes in B-tree is to define UNIQUE constraint over part of attributes enabled for index-only scan. In GiST the main point of INCLUDE attributes is to use index-only scan for attributes, whose data types don't have GiST opclasses. Discussion: https://postgr.es/m/73A1A452-AD5F-40D4-BD61-978622FF75C1%40yandex-team.ru Author: Andrey Borodin, with small changes by me Reviewed-by: Andreas Karlsson	2019-03-10 11:37:17 +03:00
Magnus Hagander	0516c61b75	Add new clientcert hba option verify-full This allows a login to require both that the cn of the certificate matches (like authentication type cert) and that another authentication method (such as password or kerberos) succeeds as well. The old value of clientcert=1 maps to the new clientcert=verify-ca, clientcert=0 maps to the new clientcert=no-verify, and the new option erify-full will add the validation of the CN. Author: Julian Markwort, Marius Timmer Reviewed by: Magnus Hagander, Thomas Munro	2019-03-09 12:19:47 -08:00
Magnus Hagander	6b9e875f72	Track block level checksum failures in pg_stat_database This adds a column that counts how many checksum failures have occurred on files belonging to a specific database. Both checksum failures during normal backend processing and those created when a base backup detects a checksum failure are counted. Author: Magnus Hagander Reviewed by: Julien Rouhaud	2019-03-09 10:47:30 -08:00
Noah Misch	3c5926301a	Avoid some table rewrites for ALTER TABLE .. SET DATA TYPE timestamp. When the timezone is UTC, timestamptz and timestamp are binary coercible in both directions. See `b8a18ad485` and `c22ecc6562` for the previous attempt in this problem space. Skip the table rewrite; for now, continue to needlessly rewrite any index on an affected column. Reviewed by Simon Riggs and Tom Lane. Discussion: https://postgr.es/m/20190226061450.GA1665944@rfd.leadboat.com	2019-03-08 20:16:27 -08:00
Tom Lane	1b76168da7	Reformat catalog .dat files. Test run for my previous commit; cleans up formatting issues in some other recent commits.	2019-03-08 12:01:27 -05:00
Tom Lane	27aaf6eff4	Minor improvements for reformat_dat_file.pl. Use Getopt::Long in preference to hand-rolled option parsing code. Also, remove "-I .../backend/catalog" switch from the Makefile invocations. That's been unnecessary for some time, and leaving it there gives the false impression it's needed in manual invocations. John Naylor (extracted from a larger but more controversial patch) Discussion: https://postgr.es/m/CACPNZCsHdcQN2jQ1=ptbi1Co2Nj3aHgRCUMk62=ThgWNabPY+Q@mail.gmail.com	2019-03-08 11:48:49 -05:00
Tom Lane	1d33858406	Fix handling of targetlist SRFs when scan/join relation is known empty. When we introduced separate ProjectSetPath nodes for application of set-returning functions in v10, we inadvertently broke some cases where we're supposed to recognize that the result of a subquery is known to be empty (contain zero rows). That's because IS_DUMMY_REL was just looking for a childless AppendPath without allowing for a ProjectSetPath being possibly stuck on top. In itself, this didn't do anything much worse than produce slightly worse plans for some corner cases. Then in v11, commit `11cf92f6e` rearranged things to allow the scan/join targetlist to be applied directly to partial paths before they get gathered. But it inserted a short-circuit path for dummy relations that was a little too short: it failed to insert a ProjectSetPath node at all for a targetlist containing set-returning functions, resulting in bogus "set-valued function called in context that cannot accept a set" errors, as reported in bug #15669 from Madelaine Thibaut. The best way to fix this mess seems to be to reimplement IS_DUMMY_REL so that it drills down through any ProjectSetPath nodes that might be there (and it seems like we'd better allow for ProjectionPath as well). While we're at it, make it look at rel->pathlist not cheapest_total_path, so that it gives the right answer independently of whether set_cheapest has been done lately. That dependency looks pretty shaky in the context of code like apply_scanjoin_target_to_paths, and even if it's not broken today it'd certainly bite us at some point. (Nastily, unsafe use of the old coding would almost always work; the hazard comes down to possibly looking through a dangling pointer, and only once in a blue moon would you find something there that resulted in the wrong answer.) It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if there are any extensions using it, they'll continue to use the old inadequate logic until they're recompiled, after which they'll fail to load into server versions predating this fix. Hopefully there are few such extensions. Having fixed IS_DUMMY_REL, the special path for dummy rels in apply_scanjoin_target_to_paths is unnecessary as well as being wrong, so we can just drop it. Also change a few places that were testing for partitioned-ness of a planner relation but not using IS_PARTITIONED_REL for the purpose; that seems unsafe as well as inconsistent, plus it required an ugly hack in apply_scanjoin_target_to_paths. In passing, save a few cycles in apply_scanjoin_target_to_paths by skipping processing of pre-existing paths for partitioned rels, and do some cosmetic cleanup and comment adjustment in that function. I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking any code that might be using it, since in almost every case that would be wrong; IS_DUMMY_REL is what to be using instead. In HEAD, also make set_dummy_rel_pathlist static (since it's no longer used from outside allpaths.c), and delete is_dummy_plan, since it's no longer used anywhere. Back-patch as appropriate into v11 and v10. Tom Lane and Julien Rouhaud Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org	2019-03-07 14:22:13 -05:00
Robert Haas	898e5e3290	Allow ATTACH PARTITION with only ShareUpdateExclusiveLock. We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com	2019-03-07 11:13:12 -05:00
Thomas Munro	42210524cc	Remove useless header inclusion.	2019-03-07 20:02:22 +13:00
Thomas Munro	91595f9d49	Drop the vestigial "smgr" type. Before commit `3fa2bb31` this type appeared in the catalogs to select which of several block storage mechanisms each relation used. New features under development propose to revive the concept of different block storage managers for new kinds of data accessed via bufmgr.c, but don't need to put references to them in the catalogs. So, avoid useless maintenance work on this type by dropping it. Update some regression tests that were referencing it where any type would do. Discussion: https://postgr.es/m/CA%2BhUKG%2BDE0mmiBZMtZyvwWtgv1sZCniSVhXYsXkvJ_Wo%2B83vvw%40mail.gmail.com	2019-03-07 15:44:04 +13:00
Andres Freund	277cb78983	Don't reuse slots between root and partition in ON CONFLICT ... UPDATE. Until now the the slot to store the conflicting tuple, and the result of the ON CONFLICT SET, where reused between partitions. That necessitated changing slots descriptor when switching partitions. Besides the overhead of switching descriptors on a slot (which requires memory allocations and prevents JITing), that's importantly also problematic for tableam. There individual partitions might belong to different tableams, needing different kinds of slots. In passing also fix ExecOnConflictUpdate to clear the existing slot at exit. Otherwise that slot could continue to hold a pin till the query ends, which could be far too long if the input data set is large, and there's no further conflicts. While previously also problematic, it's now more important as there will be more such slots when partitioned. Author: Andres Freund Reviewed-By: Robert Haas, David Rowley Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-03-06 15:43:33 -08:00
Andres Freund	b172342321	Fix copy/out/readfuncs for accessMethod addition in `8586bf7ed8`. This includes a catversion bump, as IntoClause is theoretically speaking part of storable rules. In practice I don't think that can happen, but there's no reason to be stingy here. Per buildfarm member calliphoridae.	2019-03-06 11:55:28 -08:00
Andres Freund	8586bf7ed8	tableam: introduce table AM infrastructure. This introduces the concept of table access methods, i.e. CREATE ACCESS METHOD ... TYPE TABLE and CREATE TABLE ... USING (storage-engine). No table access functionality is delegated to table AMs as of this commit, that'll be done in following commits. Subsequent commits will incrementally abstract table access functionality to be routed through table access methods. That change is too large to be reviewed & committed at once, so it'll be done incrementally. Docs will be updated at the end, as adding them incrementally would likely make them less coherent, and definitely is a lot more work, without a lot of benefit. Table access methods are specified similar to index access methods, i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with callbacks. In contrast to index AMs that struct needs to live as long as a backend, typically that's achieved by just returning a pointer to a constant struct. Psql's \d+ now displays a table's access method. That can be disabled with HIDE_TABLEAM=true, which is mainly useful so regression tests can be run against different AMs. It's quite possible that this behaviour still needs to be fine tuned. For now it's not allowed to set a table AM for a partitioned table, as we've not resolved how partitions would inherit that. Disallowing allows us to introduce, if we decide that's the way forward, such a behaviour without a compatibility break. Catversion bumped, to add the heap table AM and references to it. Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de	2019-03-06 09:54:38 -08:00
Alvaro Herrera	b96f6b1948	pg_partition_ancestors Adds another introspection feature for partitioning, necessary for further psql patches. Reviewed-by: Michaël Paquier Discussion: https://postgr.es/m/20190226222757.GA31622@alvherre.pgsql	2019-03-04 16:14:29 -03:00
Peter Eisentraut	278584b526	Remove volatile from latch API This was no longer useful since the latch functions use memory barriers already, which are also compiler barriers, and volatile does not help with cross-process access. Discussion: https://www.postgresql.org/message-id/flat/20190218202511.qsfpuj5sy4dbezcw%40alap3.anarazel.de#18783c27d73e9e40009c82f6e0df0974	2019-03-04 11:30:41 +01:00
Andres Freund	ad0bda5d24	Store tuples for EvalPlanQual in slots, rather than as HeapTuples. For the upcoming pluggable table access methods it's quite inconvenient to store tuples as HeapTuples, as that'd require converting tuples from a their native format into HeapTuples. Instead use slots to manage epq tuples. To fit into that scheme, change the foreign data wrapper callback RefetchForeignRow, to store the tuple in a slot. Insist on using the caller provided slot, so it conveniently can be stored in the corresponding EPQ slot. As there is no in core user of RefetchForeignRow, that change was done blindly, but we plan to test that soon. To avoid duplicating that work for row locks, move row locks to just directly use the EPQ slots - it previously temporarily stored tuples in LockRowsState.lr_curtuples, but that doesn't seem beneficial, given we'd possibly end up with a significant number of additional slots. The behaviour of es_epqTupleSet[rti -1] is now checked by es_epqTupleSlot[rti -1] != NULL, as that is distinguishable from a slot containing an empty tuple. Author: Andres Freund, Haribabu Kommi, Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-03-01 10:37:57 -08:00
Tom Lane	c94fb8e8ac	Standardize some more loops that chase down parallel lists. We have forboth() and forthree() macros that simplify iterating through several parallel lists, but not everyplace that could reasonably use those was doing so. Also invent forfour() and forfive() macros to do the same for four or five parallel lists, and use those where applicable. The immediate motivation for doing this is to reduce the number of ad-hoc lnext() calls, to reduce the footprint of a WIP patch. However, it seems like good cleanup and error-proofing anyway; the places that were combining forthree() with a manually iterated loop seem particularly illegible and bug-prone. There was some speculation about restructuring related parsetree representations to reduce the need for parallel list chasing of this sort. Perhaps that's a win, or perhaps not, but in any case it would be considerably more invasive than this patch; and it's not particularly related to my immediate goal of improving the List infrastructure. So I'll leave that question for another day. Patch by me; thanks to David Rowley for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us	2019-02-28 14:25:01 -05:00
Andres Freund	ff11e7f4b9	Use slots in trigger infrastructure, except for the actual invocation. In preparation for abstracting table storage, convert trigger.c to track tuples in slots. Which also happens to make code calling triggers simpler. As the calling interface for triggers themselves is not changed in this patch, HeapTuples still are extracted from the slot at that time. But that's handled solely inside trigger.c, not visible to callers. It's quite likely that we'll want to revise the external trigger interface, but that's a separate large project. As part of this work the slots used for old/new/return tuples are moved from EState into ResultRelInfo, as different updated tables might need different slots. The slots are now also now created on-demand, which is good both from an efficiency POV, but also makes the modifying code simpler. Author: Andres Freund, Amit Khandekar and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-02-26 20:31:38 -08:00
Andres Freund	b8d71745ea	Store table oid and tuple's tid in tuple slots directly. After the introduction of tuple table slots all table AMs need to support returning the table oid of the tuple stored in a slot created by said AM. It does not make sense to re-implement that in every AM, therefore move handling of table OIDs into the TupleTableSlot structure itself. It's possible that we, at a later date, might want to get rid of HeapTupleData.t_tableOid entirely, but doing so before the abstractions for table AMs are integrated turns out to be too hard, so delay that for now. Similarly, every AM needs to support the concept of a tuple identifier (tid / item pointer) for its tuples. It's quite possible that we'll generalize the exact form of a tid at a future point (to allow for things like index organized tables), but for now many parts of the code know about tids, so there's not much point in abstracting tids away. Therefore also move into slot (rather than providing API to set/get the tid associated with the tuple in a slot). Once table AM includes insert/updating/deleting tuples, the responsibility to set the correct tid after such an action will move into that. After that change, code doing such modifications, should not have to deal with HeapTuples directly anymore. Author: Andres Freund, Haribabu Kommi and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-02-26 20:31:16 -08:00
Andres Freund	5408e233f0	Allow to use HeapTupleData embedded in [Buffer]HeapTupleTableSlot. That avoids having to care about the lifetime of the HeapTupleHeaderData passed to ExecStore[Buffer]HeapTuple(). That doesn't make a huge difference for a plain HeapTupleTableSlot, but for BufferHeapTupleTableSlot it can be a significant advantage, avoiding the need to materialize slots where it's inconvenient to provide a HeapTupleData with appropriate lifetime to point to the on-disk tuple. It's quite possible that we'll want to add support functions for constructing HeapTuples using that embedded HeapTupleData, but for now callers do so themselves. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-02-26 18:15:59 -08:00
Andres Freund	8aa02b52db	Add ExecStorePinnedBufferHeapTuple. This allows to avoid an unnecessary pin/unpin cycle when storing a tuple in an already pinned buffer into a slot, when the pin isn't further needed at the call site. Only a single caller for now (to ensure coverage), but upcoming patches will increase use of the new function. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-02-26 17:59:01 -08:00
Peter Geoghegan	2ab23445bc	Remove unneeded argument from _bt_getstackbuf(). _bt_getstackbuf() is called at exactly two points following commit `efada2b8e9` (one call site is concerned with page splits, while the other is concerned with page deletion). The parent buffer returned by _bt_getstackbuf() is write-locked in both cases. Remove the 'access' argument and make _bt_getstackbuf() assume that callers require a write-lock.	2019-02-25 17:47:43 -08:00
Peter Eisentraut	bc09d5e4cc	Remove unnecessary use of PROCEDURAL Remove some unnecessary, legacy-looking use of the PROCEDURAL keyword before LANGUAGE. We mostly don't use this anymore, so some of these look a bit old. There is still some use in pg_dump, which is harder to remove because it's baked into the archive format, so I'm not touching that. Discussion: https://www.postgresql.org/message-id/2330919b-62d9-29ac-8de3-58c024fdcb96@2ndquadrant.com	2019-02-25 08:38:59 +01:00
Michael Paquier	effe7d9552	Make release of 2PC identifier and locks consistent in COMMIT PREPARED When preparing a transaction in two-phase commit, a dummy PGPROC entry holding the GID used for the transaction is registered, which gets released once COMMIT PREPARED is run. Prior releasing its shared memory state, all the locks taken in the prepared transaction are released using a dedicated set of callbacks (pgstat and multixact having similar callbacks), which may cause the locks to be released before the GID is set free. Hence, there is a small window where lock conflicts could happen, for example: - Transaction A releases its locks, still holding its GID in shared memory. - Transaction B held a lock which conflicted with locks of transaction A. - Transaction B continues its processing, reusing the same GID as transaction A. - Transaction B fails because of a conflicting GID, already in use by transaction A. This commit changes the shared memory state release so as post-commit callbacks and predicate lock cleanup happen consistently with the shared memory state cleanup for the dummy PGPROC entry. The race window is small and 2PC had this issue from the start, so no backpatch is done. On top if that fixes discussed involved ABI breakages, which are not welcome in stable branches. Reported-by: Oleksii Kliukin, Ildar Musin Diagnosed-by: Oleksii Kliukin, Ildar Musin Author: Michael Paquier Reviewed-by: Masahiko Sawada, Oleksii Kliukin Discussion: https://postgr.es/m/BF9B38A4-2BFF-46E8-BA87-A2D00A8047A6@hintbits.com	2019-02-25 14:19:34 +09:00
Tom Lane	0c7d537930	Move estimate_hashagg_tablesize to selfuncs.c, and widen result to double. It seems to make more sense for this to be in selfuncs.c, since it's largely a statistical-estimation thing, and it's related to other functions like estimate_hash_bucket_stats that are there. While at it, change the result type from Size to double. Perhaps at one point it was impossible for the result to overflow an integer, but I've got no confidence in that proposition anymore. Nothing's actually done with the result except to compare it to a work_mem-based limit, so as long as we don't get an overflow on the way to that comparison, things should be fine even with very large dNumGroups. Code movement proposed by Antonin Houska, type change by me Discussion: https://postgr.es/m/25767.1549359615@localhost	2019-02-21 14:59:12 -05:00

1 2 3 4 5 ...

8695 commits