postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-29 02:00:35 -04:00

Author	SHA1	Message	Date
Peter Geoghegan	6c6b497266	Revert "Add eager and lazy freezing strategies to VACUUM." This reverts commit `4d41799261`. Broad concerns about regressions caused by eager freezing strategy have been raised. Whether or not these concerns can be worked through in any time frame is far from certain. Discussion: https://postgr.es/m/20230126004347.gepcmyenk2csxrri@awork3.anarazel.de	2023-01-25 22:22:27 -08:00
Peter Geoghegan	4d41799261	Add eager and lazy freezing strategies to VACUUM. Eager freezing strategy avoids large build-ups of all-visible pages. It makes VACUUM trigger page-level freezing whenever doing so will enable the page to become all-frozen in the visibility map. This is useful for tables that experience continual growth, particularly strict append-only tables such as pgbench's history table. Eager freezing significantly improves performance stability by spreading out the cost of freezing over time, rather than doing most freezing during aggressive VACUUMs. It complements the insert autovacuum mechanism added by commit `b07642db`. VACUUM determines its freezing strategy based on the value of the new vacuum_freeze_strategy_threshold GUC (or reloption) with logged tables. Tables that exceed the size threshold use the eager freezing strategy. Unlogged tables and temp tables always use eager freezing strategy, since the added cost is negligible there. Non-permanent relations won't incur any extra overhead in WAL written (for the obvious reason), nor in pages dirtied (since any extra freezing will only take place on pages whose PD_ALL_VISIBLE bit needed to be set either way). VACUUM uses lazy freezing strategy for logged tables that fall under the GUC size threshold. Page-level freezing triggers based on the criteria established in commit `1de58df4`, which added basic page-level freezing. Eager freezing is strictly more aggressive than lazy freezing. Settings like vacuum_freeze_min_age still get applied in just the same way in every VACUUM, independent of the strategy in use. The only mechanical difference between eager and lazy freezing strategies is that only the former applies its own additional criteria to trigger freezing pages. Note that even lazy freezing strategy will trigger freezing whenever a page happens to have required that an FPI be written during pruning, provided that the page will thereby become all-frozen in the visibility map afterwards (due to the FPI optimization from commit `1de58df4`). The vacuum_freeze_strategy_threshold default setting is 4GB. This is a relatively low setting that prioritizes performance stability. It will be reviewed at the end of the Postgres 16 beta period. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com	2023-01-25 14:15:38 -08:00
Robert Haas	f1358ca52d	Adjust interaction of CREATEROLE with role properties. Previously, a CREATEROLE user without SUPERUSER could not alter REPLICATION users in any way, and could not set the BYPASSRLS attribute. However, they could manipulate the CREATEDB property even if they themselves did not possess it. With this change, a CREATEROLE user without SUPERUSER can set or clear the REPLICATION, BYPASSRLS, or CREATEDB property on a new role or a role that they have rights to manage if and only if that property is set for their own role. This implements the standard idea that you can't give permissions you don't have (but you can give the ones you do have). We might in the future want to provide more powerful ways to constrain what a CREATEROLE user can do - for example, to limit whether CONNECTION LIMIT can be set or the values to which it can be set - but that is left as future work. Patch by me, reviewed by Nathan Bossart, Tushar Ahuja, and Neha Sharma. Discussion: http://postgr.es/m/CA+TgmobX=LHg_J5aT=0pi9gJy=JdtrUVGAu0zhr-i5v5nNbJDg@mail.gmail.com	2023-01-24 10:57:09 -05:00
Jeff Davis	ff9618e82a	Fix MAINTAIN privileges for toast tables and partitions. Commit `60684dd8` left loose ends when it came to maintaining toast tables or partitions. For toast tables, simply skip the privilege check if the toast table is an indirect target of the maintenance command, because the main table privileges have already been checked. For partitions, allow the maintenance command if the user has the MAINTAIN privilege on the partition or any parent. Also make CLUSTER emit "skipping" messages when the user doesn't have privileges, similar to VACUUM. Author: Nathan Bossart Reported-by: Pavel Luzanov Reviewed-by: Pavel Luzanov, Ted Yu Discussion: https://postgr.es/m/20230113231339.GA2422750@nathanxps13	2023-01-14 00:16:23 -08:00
Robert Haas	e5b8a4c098	Add new GUC createrole_self_grant. Can be set to the empty string, or to either or both of "set" or "inherit". If set to a non-empty value, a non-superuser who creates a role (necessarily by relying up the CREATEROLE privilege) will grant that role back to themselves with the specified options. This isn't a security feature, because the grant that this feature triggers can also be performed explicitly. Instead, it's a user experience feature. A superuser would necessarily inherit the privileges of any created role and be able to access all such roles via SET ROLE; with this patch, you can configure createrole_self_grant = 'set, inherit' to provide a similar experience for a user who has CREATEROLE but not SUPERUSER. Discussion: https://postgr.es/m/CA+TgmobN59ct+Emmz6ig1Nua2Q-_o=r6DSD98KfU53kctq_kQw@mail.gmail.com	2023-01-10 12:44:49 -05:00
Amit Kapila	216a784829	Perform apply of large transactions by parallel workers. Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com	2023-01-09 07:52:45 +05:30
Tom Lane	a46a7011b2	Add options to control whether VACUUM runs vac_update_datfrozenxid. VACUUM normally ends by running vac_update_datfrozenxid(), which requires a scan of pg_class. Therefore, if one attempts to vacuum a database one table at a time --- as vacuumdb has done since v12 --- we will spend O(N^2) time in vac_update_datfrozenxid(). That causes serious performance problems in databases with tens of thousands of tables, and indeed the effect is measurable with only a few hundred. To add insult to injury, only one process can run vac_update_datfrozenxid at the same time per DB, so this behavior largely defeats vacuumdb's -j option. Hence, invent options SKIP_DATABASE_STATS and ONLY_DATABASE_STATS to allow applications to postpone vac_update_datfrozenxid() until the end of a series of VACUUM requests, and teach vacuumdb to use them. Per bug #17717 from Gunnar L. Sadly, this answer doesn't seem like something we'd consider back-patching, so the performance problem will remain in v12-v15. Tom Lane and Nathan Bossart Discussion: https://postgr.es/m/17717-6c50eb1c7d23a886@postgresql.org	2023-01-06 14:17:25 -05:00
Bruce Momjian	c8e1ba736b	Update copyright for 2023 Backpatch-through: 11	2023-01-02 15:00:37 -05:00
Peter Geoghegan	4ce3afb82e	Refactor how VACUUM passes around its XID cutoffs. Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as FreezeLimit and OldestXmin. This state is initialized in vacuum.c, and then passed around by code from vacuumlazy.c to heapam.c freezing related routines. The new convention is that everybody works off of the same cutoff state, which is passed around via pointers to const. Also simplify some of the logic for dealing with frozen xmin in heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to clearly distinguish xmin XIDs that we're going to freeze from those that were already frozen from before. That way the routine's xmin handling code is symmetrical with the existing xmax handling code. This is preparation for an upcoming commit that will add page level freezing. Also refactor the control flow within FreezeMultiXactId(), while adding stricter sanity checks. We now test OldestXmin directly, instead of using FreezeLimit as an inexact proxy for OldestXmin. This is further preparation for the page level freezing work, which will make the function's caller cede control of page level freezing to the function where appropriate (where heap_prepare_freeze_tuple sees a tuple that happens to contain a MultiXactId in its xmax). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com	2022-12-22 09:37:59 -08:00
Jeff Davis	60684dd834	Add grantable MAINTAIN privilege and pg_maintain role. Allows VACUUM, ANALYZE, REINDEX, REFRESH MATERIALIZED VIEW, CLUSTER, and LOCK TABLE. Effectively reverts `4441fc704d`. Instead of creating separate privileges for VACUUM, ANALYZE, and other maintenance commands, group them together under a single MAINTAIN privilege. Author: Nathan Bossart Discussion: https://postgr.es/m/20221212210136.GA449764@nathanxps13 Discussion: https://postgr.es/m/45224.1670476523@sss.pgh.pa.us	2022-12-13 17:33:28 -08:00
Alvaro Herrera	a61b1f7482	Rework query relation permission checking Currently, information about the permissions to be checked on relations mentioned in a query is stored in their range table entries. So the executor must scan the entire range table looking for relations that need to have permissions checked. This can make the permission checking part of the executor initialization needlessly expensive when many inheritance children are present in the range range. While the permissions need not be checked on the individual child relations, the executor still must visit every range table entry to filter them out. This commit moves the permission checking information out of the range table entries into a new plan node called RTEPermissionInfo. Every top-level (inheritance "root") RTE_RELATION entry in the range table gets one and a list of those is maintained alongside the range table. This new list is initialized by the parser when initializing the range table. The rewriter can add more entries to it as rules/views are expanded. Finally, the planner combines the lists of the individual subqueries into one flat list that is passed to the executor for checking. To make it quick to find the RTEPermissionInfo entry belonging to a given relation, RangeTblEntry gets a new Index field 'perminfoindex' that stores the corresponding RTEPermissionInfo's index in the query's list of the latter. ExecutorCheckPerms_hook has gained another List * argument; the signature is now: typedef bool (ExecutorCheckPerms_hook_type) (List rangeTable, List *rtePermInfos, bool ereport_on_violation); The first argument is no longer used by any in-core uses of the hook, but we leave it in place because there may be other implementations that do. Implementations should likely scan the rtePermInfos list to determine which operations to allow or deny. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com	2022-12-06 16:09:24 +01:00
Andrew Dunstan	b5d6382496	Provide per-table permissions for vacuum and analyze. Currently a table can only be vacuumed or analyzed by its owner or a superuser. This can now be extended to any user by means of an appropriate GRANT. Nathan Bossart Reviewed by: Bharath Rupireddy, Kyotaro Horiguchi, Stephen Frost, Robert Haas, Mark Dilger, Tom Lane, Corey Huinker, David G. Johnston, Michael Paquier. Discussion: https://postgr.es/m/20220722203735.GB3996698@nathanxps13	2022-11-28 12:08:14 -05:00
Peter Geoghegan	b6074846ce	Simplify vacuum_set_xid_limits() signature. Pass VACUUM parameters (VacuumParams state) to vacuum_set_xid_limits() directly, rather than passing most individual VacuumParams fields as separate arguments. Also make vacuum_set_xid_limits() output parameter symbol names match those used by its vacuumlazy.c caller. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=TE7gW5DgSahDkf0UEZigFGAoHNNN6EvSrdzC=Kn+hrA@mail.gmail.com	2022-11-23 11:10:06 -08:00
Tom Lane	34fa0ddae5	Fix CREATE DATABASE so we can pg_upgrade DBs with OIDs above 2^31. Commit `aa0105141` repeated one of the oldest mistakes in our book: thinking that OID is the same as int32. It isn't of course, and unsurprisingly the first person who came along with a database OID above 2 billion broke it. Repair. Per bug #17677 from Sergey Pankov. Back-patch to v15. Discussion: https://postgr.es/m/17677-a99fa067d7ed71c9@postgresql.org	2022-11-04 10:39:52 -04:00
Etsuro Fujita	97da48246d	Allow batch insertion during COPY into a foreign table. Commit `3d956d956` allowed the COPY, but it's done by inserting individual rows to the foreign table, so it can be inefficient due to the overhead caused by each round-trip to the foreign server. To improve performance of the COPY in such a case, this patch allows batch insertion, by extending the multi-insert machinery in CopyFrom() to the foreign-table case so that we insert multiple rows to the foreign table at once using the FDW callback routine added by commit `b663a4136`. This patch also allows this for postgres_fdw. It is enabled by the "batch_size" option added by commit `b663a4136`, which is disabled by default. When doing batch insertion, we update progress of the COPY command after performing the FDW callback routine, to count rows not suppressed by the FDW as well as a BEFORE ROW INSERT trigger. For consistency, this patch changes the timing of updating it for plain tables: previously, we updated it immediately after adding each row to the multi-insert buffer, but we do so only after writing the rows stored in the buffer out to the table using table_multi_insert(), which I think would be consistent even with non-batching mode, because in that mode we update it after writing each row out to the table using table_tuple_insert(). Andrey Lepikhov, heavily revised by me, with review from Ian Barwick, Andrey Lepikhov, and Zhihong Yu. Discussion: https://postgr.es/m/bc489202-9855-7550-d64c-ad2d83c24867%40postgrespro.ru	2022-10-13 18:45:00 +09:00
Michael Paquier	9fcdf2c787	Add support for COPY TO callback functions This is useful as a way for extensions to process COPY TO rows in the way they see fit (say auditing, analytics, backend, etc.) without the need to invoke an external process running as the OS user running the backend through PROGRAM that requires superuser rights. COPY FROM already provides a similar callback for logical replication. For COPY TO, the callback is triggered when we are ready to send a row in CopySendEndOfRow(), which is the same code path as when sending a row to a frontend or a pipe/file. A small test module, test_copy_callbacks, is added to provide some coverage for this facility. Author: Bilva Sanaba, Nathan Bossart Discussion: https://postgr.es/m/253C21D1-FCEB-41D9-A2AF-E6517015B7D7@amazon.com	2022-10-11 11:45:52 +09:00
Peter Geoghegan	a601366a46	Harmonize more parameter names in bulk. Make sure that function declarations use names that exactly match the corresponding names from function definitions in optimizer, parser, utility, libpq, and "commands" code, as well as in remaining library code. Do the same for all code related to frontend programs (with the exception of pg_dump/pg_dumpall related code). Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will handle ecpg and pg_dump/pg_dumpall. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-20 13:09:30 -07:00
Peter Geoghegan	bfcf1b3480	Harmonize parameter names in storage and AM code. Make sure that function declarations use names that exactly match the corresponding names from function definitions in storage, catalog, access method, executor, and logical replication code, as well as in miscellaneous utility/library code. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-19 19:18:36 -07:00
Peter Geoghegan	4bac9600f0	Harmonize heapam and tableam parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions. Having parameter names that are reliably consistent in this way will make it easier to reason about groups of related C functions from the same translation unit as a module. It will also make certain refactoring tasks easier. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-19 16:46:23 -07:00
Tom Lane	0a20ff54f5	Split up guc.c for better build speed and ease of maintenance. guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us	2022-09-13 11:11:45 -04:00
Peter Geoghegan	c3ffa731a5	Derive freeze cutoff from nextXID, not OldestXmin. Before now, the cutoffs that VACUUM used to determine which XIDs/MXIDs to freeze were determined at the start of each VACUUM by taking related cutoffs that represent which XIDs/MXIDs VACUUM should treat as still running, and subtracting an XID/MXID age based value controlled by GUCs like vacuum_freeze_min_age. The FreezeLimit cutoff (XID freeze cutoff) was derived by subtracting an XID age value from OldestXmin, while the MultiXactCutoff cutoff (MXID freeze cutoff) was derived by subtracting an MXID age value from OldestMxact. This approach didn't match the approach used nearby to determine whether this VACUUM operation should be an aggressive VACUUM or not. VACUUM now uses the standard approach instead: it subtracts the same age-based values from next XID/next MXID (rather than subtracting from OldestXmin/OldestMxact). This approach is simpler and more uniform. Most of the time it will have only a negligible impact on how and when VACUUM freezes. It will occasionally make VACUUM more robust in the event of problems caused by long running transaction. These are cases where OldestXmin and OldestMxact are held back by so much that they attain an age that is a significant fraction of the value of age-based settings like vacuum_freeze_min_age. There is no principled reason why freezing should be affected in any way by the presence of a long-running transaction -- at least not before the point that the OldestXmin and OldestMxact limits used by each VACUUM operation attain an age that makes it unsafe to freeze some of the XIDs/MXIDs whose age exceeds the value of the relevant age-based settings. The new approach should at least make freezing degrade more gracefully than before, even in the most extreme cases. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Nathan Bossart <nathandbossart@gmail.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkOv5CEeyOO=c91XnT5WBR_0gii0Wn5UbZhJ=4TTykDYg@mail.gmail.com	2022-08-31 11:37:35 -07:00
Robert Haas	e3ce2de09d	Allow grant-level control of role inheritance behavior. The GRANT statement can now specify WITH INHERIT TRUE or WITH INHERIT FALSE to control whether the member inherits the granted role's permissions. For symmetry, you can now likewise write WITH ADMIN TRUE or WITH ADMIN FALSE to turn ADMIN OPTION on or off. If a GRANT does not specify WITH INHERIT, the behavior based on whether the member role is marked INHERIT or NOINHERIT. This means that if all roles are marked INHERIT or NOINHERIT before any role grants are performed, the behavior is identical to what we had before; otherwise, it's different, because ALTER ROLE [NO]INHERIT now only changes the default behavior of future grants, and has no effect on existing ones. Patch by me. Reviewed and testing by Nathan Bossart and Tushar Ahuja, with design-level comments from various others. Discussion: http://postgr.es/m/CA+Tgmoa5Sf4PiWrfxA=sGzDKg0Ojo3dADw=wAHOhR9dggV=RmQ@mail.gmail.com	2022-08-25 10:06:02 -04:00
Michael Paquier	70d25bf70c	Fix comment in copyfrom_internal.h COPY_NEW_FE has become COPY_FRONTEND in `3174d69`, that has removed the frontend-backend protocol v2. Author: Zhang Mingli Discussion: https://postgr.es/m/aeec12a3-501e-449a-8d25-901c4ff706cf@Spark	2022-08-06 21:13:45 +09:00
Alvaro Herrera	ec0925c22a	Fix ENABLE/DISABLE TRIGGER to handle recursion correctly Using ATSimpleRecursion() in ATPrepCmd() to do so as `bbb927b4db` did is not correct, because ATPrepCmd() can't distinguish between triggers that may be cloned and those that may not, so would wrongly try to recurse for the latter category of triggers. So this commit restores the code in EnableDisableTrigger() that `86f575948c` had added to do the recursion, which would do it only for triggers that may be cloned, that is, row-level triggers. This also changes tablecmds.c such that ATExecCmd() is able to pass the value of ONLY flag down to EnableDisableTrigger() using its new 'recurse' parameter. This also fixes what seems like an oversight of `86f575948c` that the recursion to partition triggers would only occur if EnableDisableTrigger() had actually changed the trigger. It is more apt to recurse to inspect partition triggers even if the parent's trigger didn't need to be changed: only then can we be certain that all descendants share the same state afterwards. Backpatch all the way back to 11, like `bbb927b4db`. Care is taken not to break ABI compatibility (and that no catversion bump is needed.) Co-authored-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru> Discussion: https://postgr.es/m/CA+HiwqG-cZT3XzGAnEgZQLoQbyfJApVwOTQaCaas1mhpf+4V5A@mail.gmail.com	2022-08-05 09:47:26 +02:00
Robert Haas	b0a55e4329	Change internal RelFileNode references to RelFileNumber or RelFileLocator. We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com	2022-07-06 11:39:09 -04:00
Michael Paquier	d4bfe41281	autho_explain: Add GUC to log query parameters auto_explain.log_parameter_max_length is a new GUC part of the extension, similar to the corresponding core setting, that controls the inclusion of query parameters in the logged explain output. More tests are added to check the behavior of this new parameter: when parameters logged in full (the default of -1), when disabled (value of 0) and when partially truncated (value different than the two others). Author: Dagfinn Ilmari Mannsåker Discussion: https://postgr.es/m/87ee09mohb.fsf@wibble.ilmari.org	2022-07-06 09:55:30 +09:00
Tom Lane	23e7b38bfe	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.	2022-05-12 15:17:30 -04:00
Michael Paquier	b940918dc8	Remove "recheck" argument from check_index_is_clusterable() The last usage of this argument in this routine can be tracked down to `7e2f9062`, aka 11 years ago. Getting rid of this argument can also be an advantage for extensions calling check_index_is_clusterable(), as it removes any need to worry about the meaning of what a recheck would be at this level. Author: Justin Pryzby Discussion: https://postgr.es/m/20220411140609.GF26620@telsasoft.com	2022-04-13 15:32:35 +09:00
Robert Haas	8ec569479f	Apply PGDLLIMPORT markings broadly. Up until now, we've had a policy of only marking certain variables in the PostgreSQL header files with PGDLLIMPORT, but now we've decided to mark them all. This means that extensions running on Windows should no longer operate at a disadvantage as compared to extensions running on Linux: if the variable is present in a header file, it should be accessible. Discussion: http://postgr.es/m/CA+TgmoYanc1_FSfimhgiWSqVyP5KKmh5NP2BWNwDhO8Pg2vGYQ@mail.gmail.com	2022-04-08 08:16:38 -04:00
Tomas Vondra	2c7ea57e56	Revert "Logical decoding of sequences" This reverts a sequence of commits, implementing features related to logical decoding and replication of sequences: - `0da92dc530` - `80901b3291` - `b779d7d8fd` - `d5ed9da41d` - `a180c2b34d` - `75b1521dae` - `2d2232933b` - `002c9dd97a` - `05843b1aa4` The implementation has issues, mostly due to combining transactional and non-transactional behavior of sequences. It's not clear how this could be fixed, but it'll require reworking significant part of the patch. Discussion: https://postgr.es/m/95345a19-d508-63d1-860a-f5c2f41e8d40@enterprisedb.com	2022-04-07 20:06:36 +02:00
Peter Eisentraut	344d62fb9a	Unlogged sequences Add support for unlogged sequences. Unlike for unlogged tables, this is not a performance feature. It allows sequences associated with unlogged tables to be excluded from replication. A new subcommand ALTER SEQUENCE ... SET LOGGED/UNLOGGED is added. An identity/serial sequence now automatically gets and follows the persistence level (logged/unlogged) of its owning table. (The sequences owned by temporary tables were already temporary through the separate mechanism in RangeVarAdjustRelationPersistence().) But you can still change the persistence of an owned sequence separately. Also, pg_dump and pg_upgrade preserve the persistence of existing sequences. Discussion: https://www.postgresql.org/message-id/flat/04e12818-2f98-257c-b926-2845d74ed04f%402ndquadrant.com	2022-04-07 16:18:00 +02:00
Peter Geoghegan	0b018fabaa	Set relfrozenxid to oldest extant XID seen by VACUUM. When VACUUM set relfrozenxid before now, it set it to whatever value was used to determine which tuples to freeze -- the FreezeLimit cutoff. This approach was very naive. The relfrozenxid invariant only requires that new relfrozenxid values be <= the oldest extant XID remaining in the table (at the point that the VACUUM operation ends), which in general might be much more recent than FreezeLimit. VACUUM now carefully tracks the oldest remaining XID/MultiXactId as it goes (the oldest remaining values _after_ lazy_scan_prune processing). The final values are set as the table's new relfrozenxid and new relminmxid in pg_class at the end of each VACUUM. The oldest XID might come from a tuple's xmin, xmax, or xvac fields. It might even come from one of the table's remaining MultiXacts. Final relfrozenxid values must still be >= FreezeLimit in an aggressive VACUUM (FreezeLimit still acts as a lower bound on the final value that aggressive VACUUM can set relfrozenxid to). Since standard VACUUMs still make no guarantees about advancing relfrozenxid, they might as well set relfrozenxid to a value from well before FreezeLimit when the opportunity presents itself. In general standard VACUUMs may now set relfrozenxid to any value > the original relfrozenxid and <= OldestXmin. Credit for the general idea of using the oldest extant XID to set pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com	2022-04-03 09:57:21 -07:00
Alvaro Herrera	cfdd03f45e	Allow CLUSTER on partitioned tables This is essentially the same as applying VACUUM FULL to a partitioned table, which has been supported since commit `3c3bb99330` (March 2017). While there's no great use case in applying CLUSTER to partitioned tables, we don't have any strong reason not to allow it either. For now, partitioned indexes cannot be marked clustered, so an index must always be specified. While at it, rename some variables that were RangeVars during the development that led to `8bc717cb88` but never made it that way to the source tree; there's no need to perpetuate names that have always been more confusing than helpful. Author: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/20201028003312.GU9241@telsasoft.com Discussion: https://postgr.es/m/20200611153502.GT14879@telsasoft.com	2022-04-02 19:08:34 +02:00
Peter Eisentraut	072132f04e	Add header matching mode to COPY FROM COPY FROM supports the HEADER option to silently discard the header line from a CSV or text file. It is possible to load by mistake a file that matches the expected format, for example, if two text columns have been swapped, resulting in garbage in the database. This adds a new option value HEADER MATCH that checks the column names in the header line against the actual column names and errors out if they do not match. Author: Rémi Lapeyre <remi.lapeyre@lenstra.fr> Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com	2022-03-30 09:02:31 +02:00
Robert Haas	9c08aea6a3	Add new block-by-block strategy for CREATE DATABASE. Because this strategy logs changes on a block-by-block basis, it avoids the need to checkpoint before and after the operation. However, because it logs each changed block individually, it might generate a lot of extra write-ahead logging if the template database is large. Therefore, the older strategy remains available via a new STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy option to createdb. Somewhat controversially, this patch assembles the list of relations to be copied to the new database by reading the pg_class relation of the template database. Cross-database access like this isn't normally possible, but it can be made to work here because there can't be any connections to the database being copied, nor can it contain any in-doubt transactions. Even so, we have to use lower-level interfaces than normal, since the table scan and relcache interfaces will not work for a database to which we're not connected. The advantage of this approach is that we do not need to rely on the filesystem to determine what ought to be copied, but instead on PostgreSQL's own knowledge of the database structure. This avoids, for example, copying stray files that happen to be located in the source database directory. Dilip Kumar, with a fairly large number of cosmetic changes by me. Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor, Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian, Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others. Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com	2022-03-29 11:48:36 -04:00
Alvaro Herrera	7103ebb7aa	Add support for MERGE SQL command MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise require multiple PL statements. For example, MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular tables, partitioned tables and inheritance hierarchies, including column and row security enforcement, as well as support for row and statement triggers and transition tables therein. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used from PL/pgSQL. MERGE does not support targetting updatable views or foreign tables, and RETURNING clauses are not allowed either. These limitations are likely fixable with sufficient effort. Rewrite rules are also not supported, but it's not clear that we'd want to support them. Author: Pavan Deolasee <pavan.deolasee@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Amit Langote <amitlangote09@gmail.com> Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql	2022-03-28 16:47:48 +02:00
Tomas Vondra	923def9a53	Allow specifying column lists for logical replication This allows specifying an optional column list when adding a table to logical replication. The column list may be specified after the table name, enclosed in parentheses. Columns not included in this list are not sent to the subscriber, allowing the schema on the subscriber to be a subset of the publisher schema. For UPDATE/DELETE publications, the column list needs to cover all REPLICA IDENTITY columns. For INSERT publications, the column list is arbitrary and may omit some REPLICA IDENTITY columns. Furthermore, if the table uses REPLICA IDENTITY FULL, column list is not allowed. The column list can contain only simple column references. Complex expressions, function calls etc. are not allowed. This restriction could be relaxed in the future. During the initial table synchronization, only columns included in the column list are copied to the subscriber. If the subscription has several publications, containing the same table with different column lists, columns specified in any of the lists will be copied. This means all columns are replicated if the table has no column list at all (which is treated as column list with all columns), or when of the publications is defined as FOR ALL TABLES (possibly IN SCHEMA that matches the schema of the table). For partitioned tables, publish_via_partition_root determines whether the column list for the root or the leaf relation will be used. If the parameter is 'false' (the default), the list defined for the leaf relation is used. Otherwise, the column list for the root partition will be used. Psql commands \dRp+ and \d <table-name> now display any column lists. Author: Tomas Vondra, Alvaro Herrera, Rahila Syed Reviewed-by: Peter Eisentraut, Alvaro Herrera, Vignesh C, Ibrar Ahmed, Amit Kapila, Hou zj, Peter Smith, Wang wei, Tang, Shi yu Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com	2022-03-26 01:01:27 +01:00
Tomas Vondra	75b1521dae	Add decoding of sequences to built-in replication This commit adds support for decoding of sequences to the built-in replication (the infrastructure was added by commit `0da92dc530`). The syntax and behavior mostly mimics handling of tables, i.e. a publication may be defined as FOR ALL SEQUENCES (replicating all sequences in a database), FOR ALL SEQUENCES IN SCHEMA (replicating all sequences in a particular schema) or individual sequences. To publish sequence modifications, the publication has to include 'sequence' action. The protocol is extended with a new message, describing sequence increments. A new system view pg_publication_sequences lists all the sequences added to a publication, both directly and indirectly. Various psql commands (\d and \dRp) are improved to also display publications including a given sequence, or sequences included in a publication. Author: Tomas Vondra, Cary Huang Reviewed-by: Peter Eisentraut, Amit Kapila, Hannu Krosing, Andres Freund, Petr Jelinek Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca	2022-03-24 18:49:27 +01:00
Alvaro Herrera	ba9a7e3921	Enforce foreign key correctly during cross-partition updates When an update on a partitioned table referenced in foreign key constraints causes a row to move from one partition to another, the fact that the move is implemented as a delete followed by an insert on the target partition causes the foreign key triggers to have surprising behavior. For example, a given foreign key's delete trigger which implements the ON DELETE CASCADE clause of that key will delete any referencing rows when triggered for that internal DELETE, although it should not, because the referenced row is simply being moved from one partition of the referenced root partitioned table into another, not being deleted from it. This commit teaches trigger.c to skip queuing such delete trigger events on the leaf partitions in favor of an UPDATE event fired on the root target relation. Doing so is sensible because both the old and the new tuple "logically" belong to the root relation. The after trigger event queuing interface now allows passing the source and the target partitions of a particular cross-partition update when registering the update event for the root partitioned table. Along with the two ctids of the old and the new tuple, the after trigger event now also stores the OIDs of those partitions. The tuples fetched from the source and the target partitions are converted into the root table format, if necessary, before they are passed to the trigger function. The implementation currently has a limitation that only the foreign keys pointing into the query's target relation are considered, not those of its sub-partitioned partitions. That seems like a reasonable limitation, because it sounds rare to have distinct foreign keys pointing to sub-partitioned partitions instead of to the root table. This misbehavior stems from commit `f56f8f8da6` (which added support for foreign keys to reference partitioned tables) not paying sufficient attention to commit `2f17844104` (which had introduced cross-partition updates a year earlier). Even though the former commit goes back to Postgres 12, we're not backpatching this fix at this time for fear of destabilizing things too much, and because there are a few ABI breaks in it that we'd have to work around in older branches. It also depends on commit `f4566345cf`, which had its own share of backpatchability issues as well. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Eduard Català <eduard.catala@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com	2022-03-20 18:43:40 +01:00
Amit Kapila	52e4f0cd47	Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com	2022-02-22 08:11:50 +05:30
Peter Eisentraut	37851a8b83	Database-level collation version tracking This adds to database objects the same version tracking that collation objects have. There is a new pg_database column datcollversion that stores the version, a new function pg_database_collation_actual_version() to get the version from the operating system, and a new subcommand ALTER DATABASE ... REFRESH COLLATION VERSION. This was not originally added together with pg_collation.collversion, since originally version tracking was only supported for ICU, and ICU on a database-level is not currently supported. But we now have version tracking for glibc (since PG13), FreeBSD (since PG14), and Windows (since PG13), so this is useful to have now. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.com	2022-02-14 08:27:26 +01:00
Peter Geoghegan	efa4a9462a	Consolidate VACUUM xid cutoff logic. Push the logic for determining whether or not a VACUUM operation will be aggressive down into vacuum_set_xid_limits(). This makes the function's signature significantly simpler, and seems clearer overall. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com	2022-02-11 18:26:15 -08:00
Peter Geoghegan	872770fd6c	Add VACUUM instrumentation for scanned pages, relfrozenxid. Report on scanned pages within VACUUM VERBOSE and autovacuum logging. These are pages that were physically examined during the VACUUM operation. Note that this can include a small number of pages that were marked all-visible in the visibility map by some earlier VACUUM operation. VACUUM won't skip all-visible pages that aren't part of a range of all-visible pages that's at least 32 blocks in length (partly to avoid missing out on opportunities to advance relfrozenxid during non-aggressive VACUUMs). Commit `44fa8488` simplified the definition of scanned pages. It became the complement of the pages (of those pages from rel_pages) that were skipped using the visibility map. And so scanned pages precisely indicates how effective the visibility map was at saving work. (Before now we displayed the number of pages skipped via the visibility map when happened to be frozen pages, but not when they were merely all-visible, which was less useful to users.) Rename the user-visible OldestXmin output field to "removal cutoff", and show some supplementary information: how far behind the cutoff is (number of XIDs behind) by the time the VACUUM operation finished. This will help users to figure out what's _not_ working in extreme cases where VACUUM is fundamentally unable to remove dead tuples or freeze older tuples (e.g., due to a leaked replication slot). Also report when relfrozenxid is advanced by VACUUM in output that immediately follows "removal cutoff". This structure is intended to highlight the relationship between the new relfrozenxid value for the table, and the VACUUM operation's removal cutoff. Finally, add instrumentation of "missed dead tuples", and the number of pages that had at least one such tuple. These are fully DEAD (not just RECENTLY_DEAD) tuples with storage that could not be pruned due to failure to acquire a cleanup lock on a heap page. This is a replacement for the "skipped due to pin" instrumentation removed by commit `44fa8488`. It shows more details than before for pages where failing to get a cleanup lock actually resulted in VACUUM missing out on useful work, but usually shows nothing at all instead (the mere fact that we couldn't get a cleanup lock is usually of no consequence whatsoever now). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wznp=c=Opj8Z7RMR3G=ec3_JfGYMN_YvmCEjoPCHzWbx0g@mail.gmail.com	2022-02-11 16:48:40 -08:00
Tomas Vondra	0da92dc530	Logical decoding of sequences This extends the logical decoding to also decode sequence increments. We differentiate between sequences created in the current (in-progress) transaction, and sequences created earlier. This mixed behavior is necessary because while sequences are not transactional (increments are not subject to ROLLBACK), relfilenode changes are. So we do this: * Changes for sequences created in the same top-level transaction are treated as transactional, i.e. just like any other change from that transaction, and discarded in case of a rollback. * Changes for sequences created earlier are applied immediately, as if performed outside any transaction. This applies also after ALTER SEQUENCE, which may create a new relfilenode. Moreover, if we ever get support for DDL replication, the sequence won't exist until the transaction gets applied. Sequences created in the current transaction are tracked in a simple hash table, identified by a relfilenode. That means a sequence may already exist, but if a transaction does ALTER SEQUENCE then the increments for the new relfilenode will be treated as transactional. For each relfilenode we track the XID of (sub)transaction that created it, which is needed for cleanup at transaction end. We don't need to check the XID to decide if an increment is transactional - if we find a match in the hash table, it has to be the same transaction. This requires two minor changes to WAL-logging. Firstly, we need to ensure the sequence record has a valid XID - until now the the increment might have XID 0 if it was the first change in a subxact. But the sequence might have been created in the same top-level transaction. So we ensure the XID is assigned when WAL-logging increments. The other change is addition of "created" flag, marking increments for newly created relfilenodes. This makes it easier to maintain the hash table of sequences that need transactional handling. Note: This is needed because of subxacts. A XID 0 might still have the sequence created in a different subxact of the same top-level xact. This does not include any changes to test_decoding and/or the built-in replication - those will be committed in separate patches. A patch adding decoding of sequences was originally submitted by Cary Huang. This commit reworks various important aspects (e.g. the WAL logging and transactional/non-transactional handling). However, the original patch and reviews were very useful. Author: Tomas Vondra, Cary Huang Reviewed-by: Peter Eisentraut, Hannu Krosing, Andres Freund Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca	2022-02-10 18:43:51 +01:00
Peter Eisentraut	43f33dc018	Add HEADER support to COPY text format The COPY CSV format supports the HEADER option to output a header line. This patch adds the same option to the default text format. On input, the HEADER option causes the first line to be skipped, same as with CSV. Author: Rémi Lapeyre <remi.lapeyre@lenstra.fr> Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com	2022-01-28 09:44:47 +01:00
Tomas Vondra	269b532aef	Add stxdinherit flag to pg_statistic_ext_data Add pg_statistic_ext_data.stxdinherit flag, so that for each extended statistics definition we can store two versions of data - one for the relation alone, one for the whole inheritance tree. This is analogous to pg_statistic.stainherit, but we failed to include such flag in catalogs for extended statistics, and we had to work around it (see commits `859b3003de`, `36c4bc6e72` and `20b9fa308e`). This changes the relationship between the two catalogs storing extended statistics objects (pg_statistic_ext and pg_statistic_ext_data). Until now, there was a simple 1:1 mapping - for each definition there was one pg_statistic_ext_data row, and this row was inserted while creating the statistics (and then updated during ANALYZE). With the stxdinherit flag, we don't know how many rows there will be (child relations may be added after the statistics object is defined), so there may be up to two rows. We could make CREATE STATISTICS to always create both rows, but that seems wasteful - without partitioning we only need stxdinherit=false rows, and declaratively partitioned tables need only stxdinherit=true. So we no longer initialize pg_statistic_ext_data in CREATE STATISTICS, and instead make that a responsibility of ANALYZE. Which is what we do for regular statistics too. Patch by me, with extensive improvements and fixes by Justin Pryzby. Author: Tomas Vondra, Justin Pryzby Reviewed-by: Tomas Vondra, Justin Pryzby Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com	2022-01-16 13:38:01 +01:00
Peter Geoghegan	49c9d9fcfa	Unify VACUUM VERBOSE and autovacuum logging. The log_autovacuum_min_duration instrumentation used its own dedicated code for logging, which was not reused by VACUUM VERBOSE. This was highly duplicative, and sometimes led to each code path using slightly different accounting for essentially the same information. Clean things up by making VACUUM VERBOSE reuse the same instrumentation code. This code restructuring changes the structure of the VACUUM VERBOSE output itself, but that seems like an overall improvement. The most noticeable change in VACUUM VERBOSE output is that it no longer outputs a distinct message per index per round of index vacuuming. Most of the same information (about each index) is now shown in its new per-operation summary message. This is far more legible. A few details are no longer displayed by VACUUM VERBOSE, but that's no real loss in practice, especially in the common case where we don't need multiple index scans/rounds of vacuuming. This super fine-grained information is still available via DEBUG2 messages, which might still be useful in debugging scenarios. VACUUM VERBOSE now shows new instrumentation, which is typically very useful: all of the log_autovacuum_min_duration instrumentation that it missed out on before now. This includes information about WAL overhead, buffers hit/missed/dirtied information, and I/O timing information. VACUUM VERBOSE still retains a few INFO messages of its own. This is limited to output concerning the progress of heap rel truncation, as well as some basic information about parallel workers. These details are still potentially quite useful. They aren't a good fit for the log output, which must summarize the whole operation. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-WzmW4Me7_qR4X4ka7pxP-jGmn7=Npma_-Z-9Y1eD0MQRLw@mail.gmail.com	2022-01-14 16:50:34 -08:00
Thomas Munro	7170f2159f	Allow "in place" tablespaces. Provide a developer-only GUC allow_in_place_tablespaces, disabled by default. When enabled, tablespaces can be created with an empty LOCATION string, meaning that they should be created as a directory directly beneath pg_tblspc. This can be used for new testing scenarios, in a follow-up patch. Not intended for end-user usage, since it might confuse backup tools that expect symlinks. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGKpRWQ9SxdxxDmTBCJoR0YnFpMBe7kyzY8SUQk%2BHeskxg%40mail.gmail.com	2022-01-15 00:09:24 +13:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
Alvaro Herrera	f4566345cf	Create foreign key triggers in partitioned tables too While user-defined triggers defined on a partitioned table have a catalog definition for both it and its partitions, internal triggers used by foreign keys defined on partitioned tables only have a catalog definition for its partitions. This commit fixes that so that partitioned tables get the foreign key triggers too, just like user-defined triggers. Moreover, like user-defined triggers, partitions' internal triggers will now also have their tgparentid set appropriately. This is to allow subsequent commit(s) to make the foreign key related events to be fired in some cases using the parent table triggers instead of those of partitions'. This also changes what tgisinternal means in some cases. Currently, it means either that the trigger is an internal implementation object of a foreign key constraint, or a "child" trigger on a partition cloned from the trigger on the parent. This commit changes it to only mean the former to avoid confusion. As for the latter, it can be told by tgparentid being nonzero, which is now true both for user- defined and foreign key's internal triggers. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Arne Roland <A.Roland@index.de> Discussion: https://postgr.es/m/CA+HiwqG7LQSK+n8Bki8tWv7piHD=PnZro2y6ysU2-28JS6cfgQ@mail.gmail.com	2022-01-05 19:00:13 -03:00

1 2 3 4 5 ...

846 commits