postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-13 22:28:01 -04:00

Author	SHA1	Message	Date
Jeff Davis	e839c8ecc9	Create functions pg_set_relation_stats, pg_clear_relation_stats. These functions are used to tweak statistics on any relation, provided that the user has MAINTAIN privilege on the relation, or is the database owner. Bump catalog version. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=eErgzn7ECDpwFcptJKOk9SxZEk5Pot4d94eVTZsvj3gw@mail.gmail.com	2024-10-11 16:55:11 -07:00
Daniel Gustafsson	6f782a2a17	Avoid mixing custom and OpenSSL BIO functions PostgreSQL has for a long time mixed two BIO implementations, which can lead to subtle bugs and inconsistencies. This cleans up our BIO by just just setting up the methods we need. This patch does not introduce any functionality changes. The following methods are no longer defined due to not being needed: - gets: Not used by libssl - puts: Not used by libssl - create: Sets up state not used by libpq - destroy: Not used since libpq use BIO_NOCLOSE, if it was used it close the socket from underneath libpq - callback_ctrl: Not implemented by sockets The following methods are defined for our BIO: - read: Used for reading arbitrary length data from the BIO. No change in functionality from the previous implementation. - write: Used for writing arbitrary length data to the BIO. No change in functionality from the previous implementation. - ctrl: Used for processing ctrl messages in the BIO (similar to ioctl). The only ctrl message which matters is BIO_CTRL_FLUSH used for writing out buffered data (or signal EOF and that no more data will be written). BIO_CTRL_FLUSH is mandatory to implement and is implemented as a no-op since there is no intermediate buffer to flush. BIO_CTRL_EOF is the out-of-band method for signalling EOF to read_ex based BIO's. Our BIO is not read_ex based but someone could accidentally call BIO_CTRL_EOF on us so implement mainly for completeness sake. As the implementation is no longer related to BIO_s_socket or calling SSL_set_fd, methods have been renamed to reference the PGconn and Port types instead. This also reverts back to using BIO_set_data, with our fallback, as a small optimization as BIO_set_app_data require the ex_data mechanism in OpenSSL. Author: David Benjamin <davidben@google.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAF8qwaCZ97AZWXtg_y359SpOHe+HdJ+p0poLCpJYSUxL-8Eo8A@mail.gmail.com	2024-10-11 21:58:58 +02:00
Nathan Bossart	4e1fad3787	Add pg_ls_summariesdir(). This function returns the name, size, and last modification time of each regular file in pg_wal/summaries. This allows administrators to grant privileges to view the contents of this directory without granting privileges on pg_ls_dir(), which allows listing the contents of many other directories. This commit also gives the pg_monitor predefined role EXECUTE privileges on the new pg_ls_summariesdir() function. Bumps catversion. Author: Yushi Ogiwara Reviewed-by: Michael Paquier, Fujii Masao Discussion: https://postgr.es/m/a0a3af15a9b9daa107739eb45aa9a9bc%40oss.nttdata.com	2024-10-11 11:02:09 -05:00
Álvaro Herrera	fd64ed60b6	Unbreak overflow test for attinhcount/coninhcount Commit `90189eefc1` narrowed pg_attribute.attinhcount and pg_constraint.coninhcount from 32 to 16 bits, but kept other related structs with 32-bit wide fields: ColumnDef and CookedConstraint contain an int 'inhcount' field which is itself checked for overflow on increments, but there's no check that the values aren't above INT16_MAX before assigning to the catalog columns. This means that a creative user can get a inconsistent table definition and override some protections. Fix it by changing those other structs to also use int16. Also, modernize style by using pg_add_s16_overflow for overflow testing instead of checking for negative values. We also have Constraint.inhcount, which is here removed completely. This was added by commit `b0e96f3119` and not removed by its revert at `6f8bb7c1e9`. It is not needed by the upcoming not-null constraints patch. This is mostly academic, so we agreed not to backpatch to avoid ABI problems. Bump catversion because of the changes to parse nodes. Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Co-authored-by: 何建 (jian he) <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202410081611.up4iyofb5ie7@alvherre.pgsql	2024-10-10 17:41:01 +02:00
Fujii Masao	1909835c28	Improve descriptions of some pg_stat_checkpoints functions in pg_proc.dat. Previously, the descriptions of pg_stat_get_checkpointer_num_requested(), pg_stat_get_checkpointer_restartpoints_requested(), and pg_stat_get_checkpointer_restartpoints_performed() in pg_proc.dat referred to "backend". This was misleading because these functions report the number of checkpoints or restartpoints requested or performed by other than backends as well. This commit removes "backend" from these descriptions to avoid confusion. Bump catalog version. Idea from Anton A. Melnikov Author: Fujii Masao Reviewed-by: Anton A. Melnikov Discussion: https://postgr.es/m/8e5f353f-8b31-4a8e-9cfa-c037f22b4aee@postgrespro.ru	2024-10-11 00:12:29 +09:00
Michael Paquier	de3a2ea3b2	Introduce two fields in EState to track parallel worker activity These fields can be set by executor nodes to record how many parallel workers were planned to be launched and how many of them have been actually launched within the number initially planned. This data is able to give an approximation of the parallel worker draught a system is facing, making easier the tuning of related configuration parameters. These fields will be used by some follow-up patches to populate other parts of the system with their data. Author: Guillaume Lelarge, Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com	2024-10-09 08:07:48 +09:00
Tom Lane	2d24fd942c	Add min and max aggregates for bytea type. Similar to `a0f1fce80`, although we chose to duplicate logic rather than invoke byteacmp, primarily to avoid repeat detoasting. Marat Buharov, Aleksander Alekseev Discussion: https://postgr.es/m/CAPCEVGXiASjodos4P8pgyV7ixfVn-ZgG9YyiRZRbVqbGmfuDyg@mail.gmail.com	2024-10-08 13:52:14 -04:00
Andres Freund	57f3702471	Use aux process resource owner in walsender AIO will need a resource owner to do IO. Right now we create a resowner on-demand during basebackup, and we could do the same for AIO. But it seems easier to just always create an aux process resowner. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Andres Freund	755a4c10d1	bufmgr/smgr: Don't cross segment boundaries in StartReadBuffers() With real AIO it doesn't make sense to cross segment boundaries with one IO. Add smgrmaxcombine() to allow upper layers to query which buffers can be merged. We could continue to cross segment boundaries when not using AIO, but it doesn't really make sense, because md.c will never be able to perform the read across the segment boundary in one system call. Which means we'll mark more buffers as undergoing IO than really makes sense - if another backend desires to read the same blocks, it'll be blocked longer than necessary. So it seems better to just never cross the boundary. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Heikki Linnakangas	2bbc261ddb	Use an shmem_exit callback to remove backend from PMChildFlags on exit This seems nicer than having to duplicate the logic between InitProcess() and ProcKill() for which child processes have a PMChildFlags slot. Move the MarkPostmasterChildActive() call earlier in InitProcess(), out of the section protected by the spinlock. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-10-08 15:06:34 +03:00
Fujii Masao	4ac2a9bece	Add REJECT_LIMIT option to the COPY command. Previously, when ON_ERROR was set to 'ignore', the COPY command would skip all rows with data type conversion errors, with no way to limit the number of skipped rows before failing. This commit introduces the REJECT_LIMIT option, allowing users to specify the maximum number of erroneous rows that can be skipped. If more rows encounter data type conversion errors than allowed by REJECT_LIMIT, the COPY command will fail with an error, even when ON_ERROR = 'ignore'. Author: Atsushi Torikoshi Reviewed-by: Junwang Zhao, Kirill Reshke, jian he, Fujii Masao Discussion: https://postgr.es/m/63f99327aa6b404cc951217fa3e61fe4@oss.nttdata.com	2024-10-08 18:19:58 +09:00
Nathan Bossart	8275325a06	Restrict password hash length. Commit `6aa44060a3` removed pg_authid's TOAST table because the only varlena column is rolpassword, which cannot be de-TOASTed during authentication because we haven't selected a database yet and cannot read pg_class. Since that change, attempts to set password hashes that require out-of-line storage will fail with a "row is too big" error. This error message might be confusing to users. This commit places a limit on the length of password hashes so that attempts to set long password hashes will fail with a more user-friendly error. The chosen limit of 512 bytes should be sufficient to avoid "row is too big" errors independent of BLCKSZ, but it should also be lenient enough for all reasonable use-cases (or at least all the use-cases we could imagine). Reviewed-by: Tom Lane, Jonathan Katz, Michael Paquier, Jacob Champion Discussion: https://postgr.es/m/89e8649c-eb74-db25-7945-6d6b23992394%40gmail.com	2024-10-07 10:56:16 -05:00
Heikki Linnakangas	094ae07160	Remove unneeded #include Unneeded since commit `d72731a704`. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi	2024-10-05 15:09:32 +03:00
Heikki Linnakangas	6c0c49f7d3	Remove unused latch It was left unused by commit `bc971f4025`, which replaced the latch usage with a condition variable Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi	2024-10-05 15:09:27 +03:00
Fujii Masao	e7834a1a25	Add log_verbosity = 'silent' support to COPY command. Previously, when the on_error option was set to ignore, the COPY command would always log NOTICE messages for input rows discarded due to data type incompatibility. Users had no way to suppress these messages. This commit introduces a new log_verbosity setting, 'silent', which prevents the COPY command from emitting NOTICE messages when on_error = 'ignore' is used, even if rows are discarded. This feature is particularly useful when processing malformed files frequently, where a flood of NOTICE messages can be undesirable. For example, when frequently loading malformed files via the COPY command or querying foreign tables using file_fdw (with an upcoming patch to add on_error support for file_fdw), users may prefer to suppress these messages to reduce log noise and improve clarity. Author: Atsushi Torikoshi Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24cfd@oss.nttdata.com	2024-10-03 15:55:37 +09:00
Robert Haas	d94cf5ca7f	File size in a backup manifest should use uint64, not size_t. size_t is the size of an object in memory, not the size of a file on disk. Thanks to Tom Lane for noting the error. Discussion: http://postgr.es/m/1865585.1727803933@sss.pgh.pa.us	2024-10-02 09:59:04 -04:00
Fujii Masao	17cc5f666f	Fix inconsistent reporting of checkpointer stats. Previously, the pg_stat_checkpointer view and the checkpoint completion log message could show different numbers for buffers written during checkpoints. The view only counted shared buffers, while the log message included both shared and SLRU buffers, causing inconsistencies. This commit resolves the issue by updating both the view and the log message to separately report shared and SLRU buffers written during checkpoints. A new slru_written column is added to the pg_stat_checkpointer view to track SLRU buffers, while the existing buffers_written column now tracks only shared buffers. This change would help users distinguish between the two types of buffers, in the pg_stat_checkpointer view and the checkpoint complete log message, respectively. Bump catalog version. Author: Nitin Jadhav Reviewed-by: Bharath Rupireddy, Michael Paquier, Kyotaro Horiguchi, Robert Haas Reviewed-by: Andres Freund, vignesh C, Fujii Masao Discussion: https://postgr.es/m/CAMm1aWb18EpT0whJrjG+-nyhNouXET6ZUw0pNYYAe+NezpvsAA@mail.gmail.com	2024-10-02 11:17:47 +09:00
Peter Eisentraut	10b721821d	Use macro to define the number of enum values Refactoring in the interest of code consistency, a follow-up to `2e068db56e`. The argument against inserting a special enum value at the end of the enum definition is that a switch statement might generate a compiler warning unless it has a default clause. Aleksander Alekseev, reviewed by Michael Paquier, Dean Rasheed, Peter Eisentraut Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com	2024-10-01 09:30:24 -04:00
Peter Eisentraut	9c2a6c5a5f	Simplify checking for xlocale.h Instead of XXX_IN_XLOCALE_H for several features XXX, let's just include <xlocale.h> if HAVE_XLOCALE_H. The reason for the extra complication was apparently that some old glibc systems also had an <xlocale.h>, and you weren't supposed to include it directly, but it's gone now (as far as I can tell it was harmless to do so anyway). Author: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech	2024-10-01 07:23:45 -04:00
Peter Eisentraut	ee4859123e	jit: Use opaque pointers in all supported LLVM versions. LLVM's opaque pointer change began in LLVM 14, but remained optional until LLVM 16. When commit `37d5babb` added opaque pointer support, we didn't turn it on for LLVM 14 and 15 yet because we didn't want to risk weird bitcode incompatibility problems in released branches of PostgreSQL. (That might have been overly cautious, I don't know.) Now that PostgreSQL 18 has dropped support for LLVM versions < 14, and since it hasn't been released yet and no extensions or bitcode have been built against it in the wild yet, we can be more aggressive. We can rip out the support code and build system clutter that made opaque pointer use optional. Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussions: https://postgr.es/m/CA%2BhUKGLhNs5geZaVNj2EJ79Dx9W8fyWUU3HxcpZy55sMGcY%3DiA%40mail.gmail.com	2024-10-01 06:10:15 -04:00
Michael Paquier	4c7cd07aa6	Bump catalog version for change in VariableSetStmt Oversight in `dc68515968`, as this breaks SQL functions with a SET command. Reported-by: Tom Lane Discussion: https://postgr.es/m/1364409.1727673407@sss.pgh.pa.us	2024-09-30 14:52:03 +09:00
Michael Paquier	dc68515968	Show values of SET statements as constants in pg_stat_statements This is a continuation of work like `11c34b342b`, done to reduce the bloat of pg_stat_statements by applying more normalization to query entries. This commit is able to detect and normalize values in VariableSetStmt, resulting in: SET conf_param = $1 Compared to other parse nodes, VariableSetStmt is embedded in much more places in the parser, impacting many query patterns in pg_stat_statements. A custom jumble function is used, with an extra field in the node to decide if arguments should be included in the jumbling or not, a location field being not enough for this purpose. This approach allows for a finer tuning. Clauses relying on one or more keywords are not normalized, for example: * DEFAULT * FROM CURRENT * List of keywords. SET SESSION CHARACTERISTICS AS TRANSACTION, where it is critical to differentiate different sets of options, is a good example of why normalization should not happen. Some queries use VariableSetStmt for some subclauses with SET, that also have their values normalized: - ALTER DATABASE - ALTER ROLE - ALTER SYSTEM - CREATE/ALTER FUNCTION `ba90eac7a9` has added test coverage for most of the existing SET patterns. The expected output of these tests shows the difference this commit creates. Normalization could be perhaps applied to more portions of the grammar but what is done here is conservative, and good enough as a starting point. Author: Greg Sabino Mullane, Michael Paquier Discussion: https://postgr.es/m/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com Discussion: https://postgr.es/m/B44FA29D-EBD0-4DD9-ABC2-16F1CB087074@amazon.com Discussion: https://postgr.es/m/CAKAnmmJtJY2jzQN91=2QAD2eAJAA-Per61eyO48-TyxEg-q0Rg@mail.gmail.com	2024-09-30 14:02:00 +09:00
Fujii Masao	559efce1d6	Add num_done counter to the pg_stat_checkpointer view. Checkpoints can be skipped when the server is idle. The existing num_timed and num_requested counters in pg_stat_checkpointer track both completed and skipped checkpoints, but there was no way to count only the completed ones. This commit introduces the num_done counter, which tracks only completed checkpoints, making it easier to see how many were actually performed. Bump catalog version. Author: Anton A. Melnikov Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/9ea77f40-818d-4841-9dee-158ac8f6e690@oss.nttdata.com	2024-09-30 11:56:05 +09:00
Tom Lane	a3179ab692	Recalculate where-needed data accurately after a join removal. Up to now, remove_rel_from_query() has done a pretty shoddy job of updating our where-needed bitmaps (per-Var attr_needed and per-PlaceHolderVar ph_needed relid sets). It removed direct mentions of the to-be-removed baserel and outer join, which is the minimum amount of effort needed to keep the data structures self-consistent. But it didn't account for the fact that the removed join ON clause probably mentioned Vars of other relations, and those Vars might now not be needed as high up in the join tree as before. It's easy to show cases where this results in failing to remove a lower outer join that could also have been removed. To fix, recalculate the where-needed bitmaps from scratch after each successful join removal. This sounds expensive, but it seems to add only negligible planner runtime. (We cheat a little bit by preserving "relation 0" entries in the bitmaps, allowing us to skip re-scanning the targetlist and HAVING qual.) The submitted test case drew attention because we had successfully optimized away the lower join prior to v16. I suspect that that's somewhat accidental and there are related cases that were never optimized before and now can be. I've not tried to come up with one, though. Perhaps we should back-patch this into v16 and v17 to repair the performance regression. However, since it took a year for anyone to notice the problem, it can't be affecting too many people. Let's let the patch bake awhile in HEAD, and see if we get more complaints. Per bug #18627 from Mikaël Gourlaouen. No back-patch for now. Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org	2024-09-27 16:04:04 -04:00
Robert Haas	8dfd312902	pg_verifybackup: Verify tar-format backups. This also works for compressed tar-format backups. However, -n must be used, because we use pg_waldump to verify WAL, and it doesn't yet know how to verify WAL that is stored inside of a tarfile. Amul Sul, reviewed by Sravan Kumar and by me, and revised by me.	2024-09-27 08:40:24 -04:00
Michael Paquier	f762d99c87	Fix catalog data of new LO privilege functions This commit improves the catalog data in pg_proc for the three functions for has_largeobject_privilege(), introduced in `4eada203a5`: - Fix their descriptions (typos and consistency). - Reallocate OIDs to be within the 8000-9999 range as required by `a6417078c4`. Bump catalog version. Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/ZvUYR0V0dzWaLnsV@paquier.xyz	2024-09-27 07:26:29 +09:00
Alexander Korotkov	e658038772	Update oid for pg_wal_replay_wait() procedure Use an oid from 8000-9999 range, as required by `98eab30b93`. Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZvUY6bfTwB0GsyzP%40paquier.xyz	2024-09-26 11:49:41 +03:00
Noah Misch	aac2c9b4fd	For inplace update durability, make heap_update() callers wait. The previous commit fixed some ways of losing an inplace update. It remained possible to lose one when a backend working toward a heap_update() copied a tuple into memory just before inplace update of that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE to govern admission to the steps of copying an old tuple, modifying it, and issuing heap_update(). This includes MERGE commands. To avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when holding a relation lock sufficient to exclude inplace updaters. Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE pg_class" or "UPDATE pg_database" can still lose an inplace update. The v14+ UPDATE fix needs commit `86dc90056d`, and it wasn't worth reimplementing that fix without such infrastructure. Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas. Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com	2024-09-24 15:25:18 -07:00
Noah Misch	a07e03fd8f	Fix data loss at inplace update after heap_update(). As previously-added tests demonstrated, heap_inplace_update() could instead update an unrelated tuple of the same catalog. It could lose the update. Losing relhasindex=t was a source of index corruption. Inplace-updating commands like VACUUM will now wait for heap_update() commands like GRANT TABLE and GRANT DATABASE. That isn't ideal, but a long-running GRANT already hurts VACUUM progress more just by keeping an XID running. The VACUUM will behave like a DELETE or UPDATE waiting for the uncommitted change. For implementation details, start at the systable_inplace_update_begin() header comment and README.tuplock. Back-patch to v12 (all supported versions). In back branches, retain a deprecated heap_inplace_update(), for extensions. Reported by Smolkin Grigory. Reviewed by Nitin Motiani, (in earlier versions) Heikki Linnakangas, and (in earlier versions) Alexander Lakhin. Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com	2024-09-24 15:25:18 -07:00
Jeff Davis	ac30021356	Allow length=-1 for NUL-terminated input to pg_strncoll(), etc. Like ICU, allow a length of -1 to be specified for NUL-terminated arguments to pg_strncoll(), pg_strnxfrm(), and pg_strnxfrm_prefix(). Simplifies the code and comments. Discussion: https://postgr.es/m/2d758e07dff26bcc7cbe2aec57431329bfe3679a.camel@j-davis.com	2024-09-24 15:15:18 -07:00
Jeff Davis	ceeaaed87a	Tighten up make_libc_collator() and make_icu_collator(). Ensure that error paths within these functions do not leak a collator, and return the result rather than using an out parameter. (Error paths in the caller may still result in a leaked collator, which will be addressed separately.) In make_libc_collator(), if the first newlocale() succeeds and the second one fails, close the first locale_t object. The function make_icu_collator() doesn't have any external callers, so change it to be static. Discussion: https://postgr.es/m/54d20e812bd6c3e44c10eddcd757ec494ebf1803.camel@j-davis.com	2024-09-24 12:01:45 -07:00
Nathan Bossart	6aa44060a3	Remove pg_authid's TOAST table. pg_authid's only varlena column is rolpassword, which unfortunately cannot be de-TOASTed during authentication because we haven't selected a database yet and cannot read pg_class. By removing pg_authid's TOAST table, attempts to set password hashes that require out-of-line storage will fail with a "row is too big" error instead. We may want to provide a more user-friendly error in the future, but for now let's just remove the useless TOAST table. Bumps catversion. Reported-by: Alexander Lakhin Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/89e8649c-eb74-db25-7945-6d6b23992394%40gmail.com	2024-09-21 15:17:46 -05:00
Tomas Vondra	c4d5cb71d2	Increase the number of fast-path lock slots Replace the fixed-size array of fast-path locks with arrays, sized on startup based on max_locks_per_transaction. This allows using fast-path locking for workloads that need more locks. The fast-path locking introduced in 9.2 allowed each backend to acquire a small number (16) of weak relation locks cheaply. If a backend needs to hold more locks, it has to insert them into the shared lock table. This is considerably more expensive, and may be subject to contention (especially on many-core systems). The limit of 16 fast-path locks was always rather low, because we have to lock all relations - not just tables, but also indexes, views, etc. For planning we need to lock all relations that might be used in the plan, not just those that actually get used in the final plan. So even with rather simple queries and schemas, we often need significantly more than 16 locks. As partitioning gets used more widely, and the number of partitions increases, this limit is trivial to hit. Complex queries may easily use hundreds or even thousands of locks. For workloads doing a lot of I/O this is not noticeable, but for workloads accessing only data in RAM, the access to the shared lock table may be a serious issue. This commit removes the hard-coded limit of the number of fast-path locks. Instead, the size of the fast-path arrays is calculated at startup, and can be set much higher than the original 16-lock limit. The overall fast-path locking protocol remains unchanged. The variable-sized fast-path arrays can no longer be part of PGPROC, but are allocated as a separate chunk of shared memory and then references from the PGPROC entries. The fast-path slots are organized as a 16-way set associative cache. You can imagine it as a hash table of 16-slot "groups". Each relation is mapped to exactly one group using hash(relid), and the group is then processed using linear search, just like the original fast-path cache. With only 16 entries this is cheap, with good locality. Treating this as a simple hash table with open addressing would not be efficient, especially once the hash table gets almost full. The usual remedy is to grow the table, but we can't do that here easily. The access would also be more random, with worse locality. The fast-path arrays are sized using the max_locks_per_transaction GUC. We try to have enough capacity for the number of locks specified in the GUC, using the traditional 2^n formula, with an upper limit of 1024 lock groups (i.e. 16k locks). The default value of max_locks_per_transaction is 64, which means those instances will have 64 fast-path slots. The main purpose of the max_locks_per_transaction GUC is to size the shared lock table. It is often set to the "average" number of locks needed by backends, with some backends using significantly more locks. This should not be a major issue, however. Some backens may have to insert locks into the shared lock table, but there can't be too many of them, limiting the contention. The only solution is to increase the GUC, even if the shared lock table already has sufficient capacity. That is not free, especially in terms of memory usage (the shared lock table entries are fairly large). It should only happen on machines with plenty of memory, though. In the future we may consider a separate GUC for the number of fast-path slots, but let's try without one first. Reviewed-by: Robert Haas, Jakub Wartak Discussion: https://postgr.es/m/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com	2024-09-21 20:09:35 +02:00
Tom Lane	54562c9cfa	Improve Asserts checking relation matching in parallel scans. table_beginscan_parallel and index_beginscan_parallel contain Asserts checking that the relation a worker will use in a parallel scan is the same one the leader intended. However, they were checking for relation OID match, which was not strong enough to detect the mismatch problem fixed in `126ec0bc7`. What would be strong enough is to compare relfilenodes instead. Arguably, that's a saner definition anyway, since a scan surely operates on a physical relation not a logical one. Hence, store and compare RelFileLocators not relation OIDs. Also ensure that index_beginscan_parallel checks the index identity not just the table identity. Discussion: https://postgr.es/m/2127254.1726789524@sss.pgh.pa.us	2024-09-20 16:37:55 -04:00
Alexander Korotkov	014f9f34d2	Move pg_wal_replay_wait() to xlogfuncs.c This commit moves pg_wal_replay_wait() procedure to be a neighbor of WAL-related functions in xlogfuncs.c. The implementation of LSN waiting continues to reside in the same place. By proposal from Michael Paquier. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org	2024-09-19 14:26:11 +03:00
Nathan Bossart	b52c4fc3c0	Add TOAST table to pg_index. This change allows pg_index rows to use out-of-line storage for the "indexprs" and "indpred" columns, which enables use-cases such as very large index expressions. This system catalog was previously not given a TOAST table due to a fear of circularity issues (see commit `96cdeae07f`). Testing has not revealed any such problems, and it seems unlikely that the entries for system indexes could ever need out-of-line storage. In any case, it is still early in the v18 development cycle, so committing this now will hopefully increase the chances of finding any unexpected problems prior to release. Bumps catversion. Reported-by: Jonathan Katz Reviewed-by: Tom Lane Discussion: https://postgr.es/m/b611015f-b423-458c-aa2d-be0e655cc1b4%40postgresql.org	2024-09-18 14:42:57 -05:00
Michael Paquier	b14e9ce7d5	Extend PgStat_HashKey.objid from 4 to 8 bytes This opens the possibility to define keys for more types of statistics kinds in PgStat_HashKey, the first case being 8-byte query IDs for statistics like pg_stat_statements. This increases the size of PgStat_HashKey from 12 to 16 bytes, while PgStatShared_HashEntry, entry stored in the dshash for pgstats, keeps the same size due to alignment. xl_xact_stats_item, that tracks the stats items to drop in commit WAL records, is increased from 12 to 16 bytes. Note that individual chunks in commit WAL records should be multiples of sizeof(int), hence 8-byte object IDs are stored as two uint32, based on a suggestion from Heikki Linnakangas. While on it, the field of PgStat_HashKey is renamed from "objoid" to "objid", as for some stats kinds this field does not refer to OIDs but just IDs, like for replication slot stats. This commit bumps the following format variables: - PGSTAT_FILE_FORMAT_ID, as PgStat_HashKey is written to the stats file for non-serialized stats kinds in the dshash table. - XLOG_PAGE_MAGIC for the changes in xl_xact_stats_item. - Catalog version, for the SQL function pg_stat_have_stats(). Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZsvTS9EW79Up8I62@paquier.xyz	2024-09-18 12:44:15 +09:00
Thomas Munro	70d38e3d8a	Allow ReadStream to be consumed as raw block numbers. Commits `041b9680` and `6377e12a` changed the interface of scan_analyze_next_block() to take a ReadStream instead of a BlockNumber and a BufferAccessStrategy, and to return a value to indicate when the stream has run out of blocks. This caused integration problems for at least one known extension that uses specially encoded BlockNumber values that map to different underlying storage, because acquire_sample_rows() sets up the stream so that read_stream_next_buffer() reads blocks from the main fork of the relation's SMgrRelation. Provide read_stream_next_block(), as a way for such an extension to access the stream of raw BlockNumbers directly and forward them to its own ReadBuffer() calls after decoding, as it could in earlier releases. The new function returns the BlockNumber and BufferAccessStrategy that were previously passed directly to scan_analyze_next_block(). Alternatively, an extension could wrap the stream of BlockNumbers in another ReadStream with a callback that performs any decoding required to arrive at real storage manager BlockNumber values, so that it could benefit from the I/O combining and concurrency provided by read_stream.c. Another class of table access method that does nothing in scan_analyze_next_block() because it is not block-oriented could use this function to control the number of block sampling loops. It could match the previous behavior with "return read_stream_next_block(stream, &bas) != InvalidBlockNumber". Ongoing work is expected to provide better ANALYZE support for table access methods that don't behave like heapam with respect to storage blocks, but that will be for future releases. Back-patch to 17. Reported-by: Mats Kindahl <mats@timescale.com> Reviewed-by: Mats Kindahl <mats@timescale.com> Discussion: https://postgr.es/m/CA%2B14425%2BCcm07ocG97Fp%2BFrD9xUXqmBKFvecp0p%2BgV2YYR258Q%40mail.gmail.com	2024-09-18 11:34:28 +12:00
Peter Eisentraut	89f908a6d0	Add temporal FOREIGN KEY contraints Add PERIOD clause to foreign key constraint definitions. This is supported for range and multirange types. Temporal foreign keys check for range containment instead of equality. This feature matches the behavior of the SQL standard temporal foreign keys, but it works on PostgreSQL's native ranges instead of SQL's "periods", which don't exist in PostgreSQL (yet). Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT} are not supported yet. (previously committed as `34768ee361`, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	fc0438b4e8	Add temporal PRIMARY KEY and UNIQUE constraints Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints. These are backed by GiST indexes instead of B-tree indexes, since they are essentially exclusion constraints with = for the scalar parts of the key and && for the temporal part. (previously committed as `46a0cd4cef`, reverted by 46a0cd4cefb; the new part is this:) Because 'empty' && 'empty' is false, the temporal PK/UQ constraint allowed duplicates, which is confusing to users and breaks internal expectations. For instance, when GROUP BY checks functional dependencies on the PK, it allows selecting other columns from the table, but in the presence of duplicate keys you could get the value from any of their rows. So we need to forbid empties. This all means that at the moment we can only support ranges and multiranges for temporal PK/UQs, unlike the original patch (above). Documentation and tests for this are added. But this could conceivably be extended by introducing some more general support for the notion of "empty" for other types. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	7406ab623f	Add stratnum GiST support function This is support function 12 for the GiST AM and translates "well-known" RT*StrategyNumber values into whatever strategy number is used by the opclass (since no particular numbers are actually required). We will use this to support temporal PRIMARY KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality. This commit adds two implementations, one for internal GiST opclasses (just an identity function) and another for btree_gist opclasses. It updates btree_gist from 1.7 to 1.8, adding the support function for all its opclasses. (previously committed as `6db4598fcb`, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:29 +02:00
Jeff Davis	b0c30612c5	Simplify checks for deterministic collations. Remove redundant checks for locale->collate_is_c now that we always have a valid pg_locale_t. Also, remove pg_locale_deterministic() wrapper, which is no longer useful after commit `e9931bfb75`. Just check the field directly, consistent with other fields in pg_locale_t. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-12 13:35:56 -07:00
Fujii Masao	4eada203a5	Add has_largeobject_privilege function. This function checks whether a user has specific privileges on a large object, identified by OID. The user can be provided by name, OID, or default to the current user. If the specified large object doesn't exist, the function returns NULL. It raises an error for a non-existent user name. This behavior is basically consistent with other privilege inquiry functions like has_table_privilege. Bump catalog version. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp	2024-09-12 21:51:26 +09:00
Fujii Masao	412229d197	Deduplicate code in LargeObjectExists and myLargeObjectExists. myLargeObjectExists() and LargeObjectExists() had nearly identical code, except for handling snapshots. This commit renames myLargeObjectExists() to LargeObjectExistsWithSnapshot() and refactors LargeObjectExists() to call it internally, reducing duplication. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp	2024-09-12 21:45:42 +09:00
Peter Eisentraut	23d0b48468	Remove hardcoded hash opclass function signature exceptions hashvalidate(), which validates the signatures of support functions for the hash AM, contained several hardcoded exceptions. For example, hash/date_ops support function 1 was hashint4(), which would ordinarily fail validation because the function argument is int4, not date. But this works internally because int4 and date are of the same size. There are several more exceptions like this that happen to work and were allowed historically but would now fail the function signature validation. This patch removes those exceptions by providing new support functions that have the proper declared signatures. They internally share most of the code with the "wrong" functions they replace, so the behavior is still the same. With the exceptions gone, hashvalidate() is now simplified and relies fully on check_amproc_signature(). hashvarlena() and hashvarlenaextended() are kept in pg_proc.dat because some extensions currently use them to build hash functions for their own types, and we need to keep exposing these functions as "LANGUAGE internal" functions for that to continue to work. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/29c3b746-69e7-482a-b37c-dbbf7e5b009b@eisentraut.org	2024-09-12 12:57:43 +02:00
Michael Paquier	00c76cf21c	Move logic related to WAL replay of Heap/Heap2 into its own file This brings more clarity to heapam.c, by cleanly separating all the logic related to WAL replay and the rest of Heap and Heap2, similarly to other RMGRs like hash, btree, etc. The header reorganization is also nice in heapam.c, cutting half of the headers required. Author: Li Yong Reviewed-by: Sutou Kouhei, Michael Paquier Discussion: https://postgr.es/m/EFE55E65-D7BD-4C6A-B630-91F43FD0771B@ebay.com	2024-09-12 13:32:05 +09:00
David Rowley	9fba1ed294	Adjust tuplestore stats API `1eff8279d` added an API to tuplestore.c to allow callers to obtain storage telemetry data. That API wasn't quite good enough for callers that perform tuplestore_clear() as the telemetry functions only accounted for the current state of the tuplestore, not the maximums before tuplestore_clear() was called. There's a pending patch that would like to add tuplestore telemetry output to EXPLAIN ANALYZE for WindowAgg. That node type uses tuplestore_clear() before moving to the next window partition and we want to show the maximum space used, not the space used for the final partition. Reviewed-by: Tatsuo Ishii, Ashutosh Bapat Discussion: https://postgres/m/CAApHDvoY8cibGcicLV0fNh=9JVx9PANcWvhkdjBnDCc9Quqytg@mail.gmail.com	2024-09-12 16:02:01 +12:00
Peter Eisentraut	0785d1b8b2	common/jsonapi: support libpq as a client Based on a patch by Michael Paquier. For libpq, use PQExpBuffer instead of StringInfo. This requires us to track allocation failures so that we can return JSON_OUT_OF_MEMORY as needed rather than exit()ing. Author: Jacob Champion <jacob.champion@enterprisedb.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Co-authored-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com	2024-09-11 09:01:07 +02:00
Peter Eisentraut	56fead44dc	Add amgettreeheight index AM API routine The only current implementation is for btree where it calls _bt_getrootheight(). Other index types can now also use this to pass information to their amcostestimate routine. Previously, btree was hardcoded and other index types could not hook into the optimizer at this point. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-09-10 10:03:23 +02:00
Richard Guo	f5050f795a	Mark expressions nullable by grouping sets When generating window_pathkeys, distinct_pathkeys, or sort_pathkeys, we failed to realize that the grouping/ordering expressions might be nullable by grouping sets. As a result, we may incorrectly deem that the PathKeys are redundant by EquivalenceClass processing and thus remove them from the pathkeys list. That would lead to wrong results in some cases. To fix this issue, we mark the grouping expressions nullable by grouping sets if that is the case. If the grouping expression is a Var or PlaceHolderVar or constructed from those, we can just add the RT index of the RTE_GROUP RTE to the existing nullingrels field(s); otherwise we have to add a PlaceHolderVar to carry on the nullingrel bit. However, we have to manually remove this nullingrel bit from expressions in various cases where these expressions are logically below the grouping step, such as when we generate groupClause pathkeys for grouping sets, or when we generate PathTarget for initial input to grouping nodes. Furthermore, in set_upper_references, the targetlist and quals of an Agg node should have nullingrels that include the effects of the grouping step, ie they will have nullingrels equal to the input Vars/PHVs' nullingrels plus the nullingrel bit that references the grouping RTE. In order to perform exact nullingrels matches, we also need to manually remove this nullingrel bit. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:36:48 +09:00

1 2 3 4 5 ...

11739 commits