postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-29 10:11:47 -04:00

Author	SHA1	Message	Date
Amit Kapila	f2dbc83501	Fix use-after-free issue in slot synchronization. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 18, where it was introduced Discussion: https://postgr.es/m/CANhcyEXMrcEdzj-RNGJam0nJHM4y+ttdWsgUCFmXciM7BNKc7A@mail.gmail.com	2025-09-03 06:31:05 +00:00
Nathan Bossart	6fbd7b93c6	Remove unused parameter from ProcessSlotSyncInterrupts(). Oversight in commit `93db6cbda0`. Author: ChangAo Chen <cca5507@qq.com> Discussion: https://postgr.es/m/tencent_7B42BBE8D0A5C28DDAB91436192CBCCB8307%40qq.com	2025-08-29 10:56:10 -05:00
Álvaro Herrera	325fc0ab14	Avoid including commands/dbcommands.h in so many places This has been done historically because of get_database_name (which since commit `cb98e6fb8f` belongs in lsyscache.c/h, so let's move it there) and get_database_oid (which is in the right place, but whose declaration should appear in pg_database.h rather than dbcommands.h). Clean this up. Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h since commit `f1fd515b39`, so remove them. Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql	2025-08-28 12:39:04 +02:00
Fujii Masao	4614d53d4e	Avoid unexpected shutdown when sync_replication_slots is enabled. Previously, enabling sync_replication_slots while wal_level was not set to logical could cause the server to shut down. This was because the postmaster performed a configuration check before launching the slot synchronization worker and raised an ERROR if the settings were incompatible. Since ERROR is treated as FATAL in the postmaster, this resulted in the entire server shutting down unexpectedly. This commit changes the postmaster to log that message with a LOG-level instead of raising an ERROR, allowing the server to continue running even with the misconfiguration. Back-patch to v17, where slot synchronization was introduced. Reported-by: Hugo DUBOIS <hdubois@scaleway.com> Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Hugo DUBOIS <hdubois@scaleway.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Discussion: https://postgr.es/m/CAH0PTU_pc3oHi__XESF9ZigCyzai1Mo3LsOdFyQA4aUDkm01RA@mail.gmail.com Backpatch-through: 17	2025-08-04 20:51:42 +09:00
Álvaro Herrera	2633dae2e4	Standardize LSN formatting by zero padding This commit standardizes the output format for LSNs to ensure consistent representation across various tools and messages. Previously, LSNs were inconsistently printed as `%X/%X` in some contexts, while others used zero-padding. This often led to confusion when comparing. To address this, the LSN format is now uniformly set to `%X/%08X`, ensuring the lower 32-bit part is always zero-padded to eight hexadecimal digits. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM	2025-07-07 13:57:43 +02:00
Peter Eisentraut	50fd428b2b	Message style improvements	2025-06-28 19:18:06 +02:00
Amit Kapila	1546e17f9d	Improve log messages and docs for slot synchronization. Improve the clarity of LOG messages when a failover logical slot synchronization fails, making the reasons more explicit for easier debugging. Update the documentation to outline scenarios where slot synchronization can fail, especially during the initial sync, and emphasize that pg_sync_replication_slot() is primarily intended for testing and debugging purposes. We also discussed improving the functionality of pg_sync_replication_slot() so that it can be used reliably, but we would take up that work for next version after some more discussion and review. Reported-by: Suraj Kharage <suraj.kharage@enterprisedb.com> Author: shveta malik <shveta.malik@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 17, where it was introduced Discussion: https://postgr.es/m/CAF1DzPWTcg+m+x+oVVB=y4q9=PYYsL_mujVp7uJr-_oUtWNGbA@mail.gmail.com	2025-06-19 09:48:08 +05:30
Amit Kapila	3ff2a1f0c9	Fix assertion failure during decoding from synced slots. The slot synchronization skips updating the confirmed_flush LSN of the local slot if the local slot has a newer catalog_xmin or restart_lsn, but still allows updating the two_phase and two_phase_at fields of the slot. This opens up a window for the prepared transactions between old confirmed_flush LSN and two_phase_at to unexpectedly get decoded and sent to the downstream after promotion. Then, while decoding the commit prepared the assert will fail, which expects that the prepare hasn't been sent to the downstream. The fix is to skip updating the other slot fields when we are skipping to update the confirmed_flush LSN of the slot. We didn't backpatch this commit as two_phase_at was not synced in back branches, which means prepared transactions won't be unexpectedly sent to downstream. We discovered this problem while analyzing BF failure reported in the discussion link. Reliably reproducing this issue without a debugger is difficult. Given its rarity, adding specific injection point to test it doesn't seem worthwhile, so we won't be adding a dedicated test case. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB5716B44052000EB91EFAE60E94BC2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2025-04-29 12:52:05 +05:30
Amit Kapila	4868c96bc8	Fix slot synchronization for two_phase enabled slots. The issue is that the transactions prepared before two-phase decoding is enabled can fail to replicate to the subscriber after being committed on a promoted standby following a failover. This is because the two_phase_at field of a slot, which tracks the LSN from which two-phase decoding starts, is not synchronized to standby servers. Without two_phase_at, the logical decoding might incorrectly identify prepared transaction as already replicated to the subscriber after promotion of standby server, causing them to be skipped. To address the issue on HEAD, the two_phase_at field of the slot is exposed by the pg_replication_slots view and allows the slot synchronization to copy this value to the corresponding synced slot on the standby server. This bug is likely to occur if the user toggles the two_phase option to true after initial slot creation. Given that altering the two_phase option of a replication slot is not allowed in PostgreSQL 17, this bug is less likely to occur. We can't change the view/function definition in backbranch so we can't push the same fix but we are brainstorming an appropriate solution for PG17. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/TYAPR01MB5724CC7C288535BBCEEE65DA94A72@TYAPR01MB5724.jpnprd01.prod.outlook.com	2025-04-03 12:26:54 +05:30
Peter Eisentraut	7202d72787	backend launchers void * arguments for binary data Change backend launcher functions to take void * for binary data instead of char *. This removes the need for numerous casts. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org	2025-02-21 08:03:33 +01:00
Amit Kapila	0ec3c295e7	Avoid updating inactive_since for invalid replication slots. It is possible for the inactive_since value of an invalid replication slot to be updated multiple times, which is unexpected behavior like during the release of the slot or at the time of restart. This is harmless because invalid slots are not allowed to be accessed but it is not prudent to update invalid slots. We are planning to invalidate slots due to other reasons like idle time and it will look odd that the slot's inactive_since displays the recent time in this field after invalidated due to idle time. So, this patch ensures that the inactive_since field of slots is not updated for invalid slots. In the passing, ensure to use the same inactive_since time for all the slots at restart while restoring them from the disk. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CABdArM7QdifQ_MHmMA=Cc4v8+MeckkwKncm2Nn6tX9wSCQ-+iw@mail.gmail.com	2025-02-05 08:56:14 +05:30
Amit Kapila	f41d8468dd	Raise an error while trying to acquire an invalid slot. Once a replication slot is invalidated, it cannot be altered or used to fetch changes. However, a process could still acquire an invalid slot and fail later. For example, if a process acquires a logical slot that was invalidated due to wal_removed, it will eventually fail in CreateDecodingContext() when attempting to access the removed WAL. Similarly, for physical replication slots, even if the slot is invalidated and invalidation_reason is set to wal_removed, the walsender does not currently check for invalidation when starting physical replication. Instead, replication starts, and an error is only reported later while trying to access WAL. Similarly, we prohibit modifying slot properties for invalid slots but give the error for the same after acquiring the slot. This patch improves error handling by detecting invalid slots earlier at the time of slot acquisition which is the first step. This also helped in unifying different ERROR messages at different places and gave a consistent message for invalid slots. This means that the message for invalid slots will change to a generic message. This will also be helpful for future patches where we are planning to invalidate slots due to more reasons like idle_timeout because we don't have to modify multiple places in such cases and avoid the chances of missing out on a particular place. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CABdArM6pBL5hPnSQ+5nEVMANcF4FCH7LQmgskXyiLY75TMnKpw@mail.gmail.com	2025-01-31 10:27:35 +05:30
Bruce Momjian	50e6eb731d	Update copyright for 2025 Backpatch-through: 13	2025-01-01 11:21:55 -05:00
Amit Kapila	d05a387d9d	Doc: Clarify the `inactive_since` field description. Updated to specify that it represents the exact time a slot became inactive, rather than the period of inactivity. Reported-by: Peter Smith Author: Bruce Momjian, Nisha Moond Reviewed-by: Amit Kapila, Peter Smith Backpatch-through: 17 Discussion: https://postgr.es/m/CAHut+PuvsyA5v8y7rYoY9mkDQzUhwaESM05yCByTMaDoRh30tA@mail.gmail.com	2024-11-25 11:12:32 +05:30
Peter Eisentraut	e18512c000	Remove unused #include's from backend .c files as determined by IWYU These are mostly issues that are new since commit `dbbca2cf29`. Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org	2024-10-27 08:26:50 +01:00
Michael Paquier	b4db64270e	Apply more quoting to GUC names in messages This is a continuation of `17974ec259`. More quotes are applied to GUC names in error messages and hints, taking care of what seems to be all the remaining holes currently in the tree for the GUCs. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com	2024-09-04 13:50:44 +09:00
Michael Paquier	4236825197	Fix typos and grammar in code comments and docs Author: Alexander Lakhin Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com	2024-09-03 14:49:04 +09:00
Peter Eisentraut	edee0c621d	Message style improvements	2024-08-29 14:43:34 +02:00
Peter Eisentraut	dc26ff2f22	Fix misplaced translator comments They did not immediately precede the code they were applying to.	2024-08-27 16:15:28 +02:00
Tom Lane	0d8bd0a72e	Improve logical replication connection-failure messages. These messages mostly said "could not connect to the publisher: %s" which is lacking context. Add some verbiage to indicate which subscription or worker process is failing. Nisha Moond Discussion: https://postgr.es/m/CABdArM7q1=zqL++cYd0hVMg3u_tc0S=0Of=Um-KvDhLony0cSg@mail.gmail.com	2024-07-11 13:21:13 -04:00
Heikki Linnakangas	eb21f5bc67	Remove redundant SetProcessingMode(InitProcessing) calls After several refactoring iterations, auxiliary processes are no longer initialized from the bootstrapper. Using the InitProcessing mode for initializing auxiliary processes is more appropriate. Since the global variable Mode is initialized to InitProcessing, we can just remove the redundant calls of SetProcessingMode(InitProcessing). Author: Xing Guo <higuoxing@gmail.com> Discussion: https://www.postgresql.org/message-id/CACpMh%2BDBHVT4xPGimzvex%3DwMdMLQEu9PYhT%2BkwwD2x2nu9dU_Q%40mail.gmail.com	2024-07-02 20:14:40 +03:00
Peter Eisentraut	720b0eaae9	Convert some extern variables to static These probably should have been static all along, it was only forgotten out of sloppiness. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-02 07:26:22 +02:00
Amit Kapila	2357c9223b	Rename standby_slot_names to synchronized_standby_slots. The standby_slot_names GUC allows the specification of physical standby slots that must be synchronized before the logical walsenders associated with logical failover slots. However, for this purpose, the GUC name is too generic. Author: Hou Zhijie Reviewed-by: Bertrand Drouvot, Masahiko Sawada Backpatch-through: 17 Discussion: https://postgr.es/m/ZnWeUgdHong93fQN@momjian.us	2024-07-01 11:36:56 +05:30
Peter Eisentraut	e9b7aee272	A few follow-up fixes for GUC name quoting Fixups for `17974ec259`: Some messages were missed (and some were new since the patch was originally proposed), and there was a typo introduced.	2024-05-17 13:48:31 +02:00
Peter Eisentraut	6d716adf85	Fix incorrect format placeholder	2024-05-08 08:37:46 +02:00
David Rowley	a42fc1c903	Fix an assortment of typos Author: Alexander Lakhin Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com	2024-05-04 02:33:25 +12:00
David Rowley	310cd8ab38	Fix duplicated consecutive words in comments Also, fix a comment incorrectly referencing the "streaming read API". This was renamed to "read stream" shortly before being committed. Discussion: https://postgr.es/m/CAApHDvq-2Zdqytm_Hf3RmVf0qg5PS9jTFAJ5QTc9xH9pwvwDTA@mail.gmail.com	2024-04-28 20:03:34 +12:00
Amit Kapila	db08e8c6fa	Post-commit review fixes for slot synchronization. Allow pg_sync_replication_slots() to error out during promotion of standby. This makes the behavior of the SQL function consistent with the slot sync worker. We also ensured that pg_sync_replication_slots() cannot be executed if sync_replication_slots is enabled and the slotsync worker is already running to perform the synchronization of slots. Previously, it would have succeeded in cases when the worker is idle and failed when it is performing sync which could confuse users. This patch fixes another issue in the slot sync worker where SignalHandlerForShutdownRequest() needs to be registered before setting SlotSyncCtx->pid, otherwise, the slotsync worker could miss handling SIGINT sent by the startup process(ShutDownSlotSync) if it is sent before worker could register SignalHandlerForShutdownRequest(). To be consistent, all signal handlers' registration is moved to a prior location before we set the worker's pid. Ensure that we clean up synced temp slots at the end of pg_sync_replication_slots() to avoid such slots being left over after promotion. Ensure that ShutDownSlotSync() captures SlotSyncCtx->pid under spinlock to avoid accessing invalid value as it can be reset by concurrent slot sync exit due to an error. Author: Shveta Malik Reviewed-by: Hou Zhijie, Bertrand Drouvot, Amit Kapila, Masahiko Sawada Discussion: https://postgr.es/m/CAJpy0uBefXUS_TSz=oxmYKHdg-fhxUT0qfjASW3nmqnzVC3p6A@mail.gmail.com	2024-04-25 14:01:44 +05:30
Daniel Gustafsson	950d4a2cb1	Fix typos and duplicate words This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se	2024-04-18 21:28:07 +02:00
Amit Kapila	3741f2a09d	Fix the review comments and a bug in the slot sync code. Ensure that when updating the catalog_xmin of the synced slots, it is first written to disk before changing the in-memory value (effective_catalog_xmin). This is to prevent a scenario where the in-memory value change triggers a vacuum to remove catalog tuples before the catalog_xmin is written to disk. In the event of a crash before the catalog_xmin is persisted, we would not know that some required catalog tuples have been removed and the synced slot would be invalidated. Change the sanity check to ensure that remote_slot's confirmed_flush LSN can't precede the local/synced slot during slot sync. Note that the restart_lsn of the synced/local slot can be ahead of remote_slot. This can happen when slot advancing machinery finds a running xacts record after reaching the consistent state at a later point than the primary where it serializes the snapshot and updates the restart_lsn. Make the check to sync slots robust by allowing to sync only when the confirmed_lsn, restart_lsn, or catalog_xmin of the remote slot is ahead of the synced/local slot. Reported-by: Amit Kapila and Shveta Malik Author: Hou Zhijie, Shveta Malik Reviewed-by: Amit Kapila, Bertrand Drouvot Discussion: https://postgr.es/m/OS0PR01MB57162B67D3CB01B2756FBA6D94062@OS0PR01MB5716.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/CAJpy0uCSS5zmdyUXhvw41HSdTbRqX1hbYqkOfHNj7qQ+2zn0AQ@mail.gmail.com	2024-04-12 15:10:41 +05:30
David Rowley	8461424fd7	Fixup various StringInfo function usages This adjusts various appendStringInfo* function calls to use a more appropriate and efficient function with the same behavior. For example, use appendStringInfoChar() when appending a single character rather than appendStringInfo() and appendStringInfoString() when no formatting is required rather than using appendStringInfo(). All adjustments made here are in code that's new to v17, so it makes sense to fix these now rather than wait a few years and make backpatching harder. Discussion: https://postgr.es/m/CAApHDvojY2UvMiO+9_55ArTj10P1LBNJyyoGB+C65BLDNT0GsQ@mail.gmail.com Reviewed-by: Nathan Bossart, Tom Lane	2024-04-10 11:53:32 +12:00
Amit Kapila	6f132ed693	Allow synced slots to have their inactive_since. This commit does two things: 1) Maintains inactive_since for sync slots whenever the slot is released just like any other regular slot. 2) Ensures the value is set to the current timestamp during the promotion of standby to help correctly interpret the time after promotion. We don't want the slots to appear inactive for a long time after promotion if they haven't been synchronized recently. This would also avoid the invalidation of such slots immediately after promotion if tomorrow we have a feature that invalidates slots based on their inactivity time. Whoever acquires the slot i.e. makes the slot active will reset it to NULL. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com	2024-04-05 09:48:49 +05:30
Amit Kapila	2ec005b4e2	Ensure that the sync slots reach a consistent state after promotion without losing data. We were directly copying the LSN locations while syncing the slots on the standby. Now, it is possible that at some particular restart_lsn there are some running xacts, which means if we start reading the WAL from that location after promotion, we won't reach a consistent snapshot state at that point. However, on the primary, we would have already been in a consistent snapshot state at that restart_lsn so we would have just serialized the existing snapshot. To avoid this problem we will use the advance_slot functionality unless the snapshot already exists at the synced restart_lsn location. This will help us to ensure that snapbuilder/slot statuses are updated properly without generating any changes. Note that the synced slot will remain as RS_TEMPORARY till the decoding from corresponding restart_lsn can reach a consistent snapshot state after which they will be marked as RS_PERSISTENT. Per buildfarm Author: Hou Zhijie Reviewed-by: Bertrand Drouvot, Shveta Malik, Bharath Rupireddy, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-04-03 14:04:59 +05:30
Amit Kapila	6ae701b437	Track invalidation_reason in pg_replication_slots. Till now, the reason for replication slot invalidation is not tracked directly in pg_replication_slots. A recent commit `007693f2a3` added 'conflict_reason' to show the reasons for slot conflict/invalidation, but only for logical slots. This commit adds a new column 'invalidation_reason' to show invalidation reasons for both physical and logical slots. And, this commit also turns 'conflict_reason' text column to 'conflicting' boolean column (effectively reverting commit `007693f2a3`). The 'conflicting' column is true for invalidation reasons 'rows_removed' and 'wal_level_insufficient' because those make the slot conflict with recovery. When 'conflicting' is true, one can now look at the new 'invalidation_reason' column for the reason for the logical slot's conflict with recovery. The new 'invalidation_reason' column will also be useful to track other invalidation reasons in the future commit. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik Discussion: https://www.postgresql.org/message-id/ZfR7HuzFEswakt/a%40ip-10-97-1-34.eu-west-3.compute.internal Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com	2024-03-22 13:52:05 +05:30
Heikki Linnakangas	aafc05de1b	Refactor postmaster child process launching Introduce new postmaster_child_launch() function that deals with the differences in EXEC_BACKEND mode. Refactor the mechanism of passing information from the parent to child process. Instead of using different command-line arguments when launching the child process in EXEC_BACKEND mode, pass a variable-length blob of startup data along with all the global variables. The contents of that blob depend on the kind of child process being launched. In !EXEC_BACKEND mode, we use the same blob, but it's simply inherited from the parent to child process. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-18 11:35:30 +02:00
Amit Kapila	bf279ddd1c	Introduce a new GUC 'standby_slot_names'. This patch provides a way to ensure that physical standbys that are potential failover candidates have received and flushed changes before the primary server making them visible to subscribers. Doing so guarantees that the promoted standby server is not lagging behind the subscribers when a failover is necessary. The logical walsender now guarantees that all local changes are sent and flushed to the standby servers corresponding to the replication slots specified in 'standby_slot_names' before sending those changes to the subscriber. Additionally, the SQL functions pg_logical_slot_get_changes, pg_logical_slot_peek_changes and pg_replication_slot_advance are modified to ensure that they process changes for failover slots only after physical slots specified in 'standby_slot_names' have confirmed WAL receipt for those. Author: Hou Zhijie and Shveta Malik Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-03-08 08:10:45 +05:30
Heikki Linnakangas	393b5599e5	Use MyBackendType in more places to check what process this is Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(), IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of new AmProcess() macros that use MyBackendType. For consistency with the existing AmProcess() macros. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi	2024-03-04 10:25:12 +02:00
Amit Kapila	b3f6b14cf4	Fixups for commit `93db6cbda0`. Ensure to set always-secure search path for both local and remote connections during slot synchronization, so that malicious users can't redirect user code (e.g. operators). In the passing, improve the name of define, remove spurious return statement, and a minor change in one of the comments. Author: Bertrand Drouvot and Shveta Malik Reviewed-by: Amit Kapila, Peter Smith Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com Discussion: https://postgr.es/m/ZdcejBDCr+wlVGnO@ip-10-97-1-34.eu-west-3.compute.internal Discussion: https://postgr.es/m/CAJpy0uBNP=nrkNJkJSfF=jSocEh8vU2Owa8Rtpi=63fG=SvfVQ@mail.gmail.com	2024-02-29 09:45:20 +05:30
Amit Kapila	93db6cbda0	Add a new slot sync worker to synchronize logical slots. By enabling slot synchronization, all the failover logical replication slots on the primary (assuming configurations are appropriate) are automatically created on the physical standbys and are synced periodically. The slot sync worker on the standby server pings the primary server at regular intervals to get the necessary failover logical slots information and create/update the slots locally. The slots that no longer require synchronization are automatically dropped by the worker. The nap time of the worker is tuned according to the activity on the primary. The slot sync worker waits for some time before the next synchronization, with the duration varying based on whether any slots were updated during the last cycle. A new parameter sync_replication_slots enables or disables this new process. On promotion, the slot sync worker is shut down by the startup process to drop any temporary slots acquired by the slot sync worker and to prevent the worker from trying to fetch the failover slots. A functionality to allow logical walsenders to wait for the physical will be done in a subsequent commit. Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-02-22 15:25:15 +05:30
Amit Kapila	801792e528	Improve ERROR/LOG messages added by commits `ddd5f4f54a` and `7a424ece48`. Additionally, in slotsync.c, replace one StringInfoData variable usage with a constant string to avoid palloc/pfree. Also, replace the inclusion of "logical.h" with "slot.h" to prevent the exposure of unnecessary implementation details. Reported-by: Kyotaro Horiguchi, Masahiko Sawada Author: Shveta Malik based on suggestions by Robert Haas and Amit Kapila Reviewed-by: Kyotaro Horiguchi, Amit Kapila Discussion: https://postgr.es/m/20240214.162652.773291409747353211.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/20240219.134015.1888940527023074780.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/CAD21AoCYXhDYOQDAS-rhGasC2T+tYbV=8Y18o94sB=5AxcW+yA@mail.gmail.com	2024-02-22 11:17:00 +05:30
Amit Kapila	b987be39c3	Fix the incorrect format specifier used in commit `7a424ece48`. Author: Hou Zhijie Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com Discussion: https://postgr.es/m/OS0PR01MB5716CB015BAD807B29BC55BE944C2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-02-16 11:34:11 +05:30
Amit Kapila	7a424ece48	Add more LOG and DEBUG messages for slot synchronization. This provides more information about remote slots during synchronization which helps in debugging bugs and BF failures due to test case issues. We might later want to change the LOG message added by this patch to DEBUG1. Author: Hou Zhijie Reviewed-by: Amit Kapila, Bertrand Drouvot Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com Discussion: https://postgr.es/m/OS0PR01MB571633C23B2A4CAC5FB0371A944C2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-02-16 09:02:54 +05:30
Amit Kapila	ddd5f4f54a	Add a slot synchronization function. This commit introduces a new SQL function pg_sync_replication_slots() which is used to synchronize the logical replication slots from the primary server to the physical standby so that logical replication can be resumed after a failover or planned switchover. A new 'synced' flag is introduced in pg_replication_slots view, indicating whether the slot has been synchronized from the primary server. On a standby, synced slots cannot be dropped or consumed, and any attempt to perform logical decoding on them will result in an error. The logical replication slots on the primary can be synchronized to the hot standby by using the 'failover' parameter of pg-create-logical-replication-slot(), or by using the 'failover' option of CREATE SUBSCRIPTION during slot creation, and then calling pg_sync_replication_slots() on standby. For the synchronization to work, it is mandatory to have a physical replication slot between the primary and the standby aka 'primary_slot_name' should be configured on the standby, and 'hot_standby_feedback' must be enabled on the standby. It is also necessary to specify a valid 'dbname' in the 'primary_conninfo'. If a logical slot is invalidated on the primary, then that slot on the standby is also invalidated. If a logical slot on the primary is valid but is invalidated on the standby, then that slot is dropped but will be recreated on the standby in the next pg_sync_replication_slots() call provided the slot still exists on the primary server. It is okay to recreate such slots as long as these are not consumable on standby (which is the case currently). This situation may occur due to the following reasons: - The 'max_slot_wal_keep_size' on the standby is insufficient to retain WAL records from the restart_lsn of the slot. - 'primary_slot_name' is temporarily reset to null and the physical slot is removed. The slot synchronization status on the standby can be monitored using the 'synced' column of pg_replication_slots view. A functionality to automatically synchronize slots by a background worker and allow logical walsenders to wait for the physical will be done in subsequent commits. Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-02-14 09:45:36 +05:30

43 commits