postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-10 10:11:28 -04:00

Author	SHA1	Message	Date
Michael Paquier	910e2aca12	Address set of issues with errno handling System calls mixed up in error code paths are causing two issues which several code paths have not correctly handled: 1) For write() calls, sometimes the system may return less bytes than what has been written without errno being set. Some paths were careful enough to consider that case, and assumed that errno should be set to ENOSPC, other calls missed that. 2) errno generated by a system call is overwritten by other system calls which may succeed once an error code path is taken, causing what is reported to the user to be incorrect. This patch uses the brute-force approach of correcting all those code paths. Some refactoring could happen in the future, but this is let as future work, which is not targeted for back-branches anyway. Author: Michael Paquier Reviewed-by: Ashutosh Sharma Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz	2018-06-25 11:21:49 +09:00
Alvaro Herrera	d4429d50a2	Refrain from duplicating data in reorderbuffers If a walsender exits leaving data in reorderbuffers, the next walsender that tries to decode the same transaction would append its decoded data in the same spill files without truncating it first, which effectively duplicate the data. Avoid that by removing any leftover reorderbuffer spill files when a walsender starts. Backpatch to 9.4; this bug has been there from the very beginning of logical decoding. Author: Craig Ringer, revised by me Reviewed by: Álvaro Herrera, Petr Jelínek, Masahiko Sawada	2018-03-06 16:20:03 -03:00
Alvaro Herrera	132cd58d6d	Fix failure to delete spill files of aborted transactions Logical decoding's reorderbuffer.c may spill transaction files to disk when transactions are large. These are supposed to be removed when they become "too old" by xid; but file removal requires the boundary LSNs of the transaction to be known. The final_lsn is only set when we see the commit or abort record for the transaction, but nothing sets the value for transactions that crash, so the removal code misbehaves -- in assertion-enabled builds, it crashes by a failed assertion. To fix, modify the final_lsn of transactions that don't have a value set, to the LSN of the very latest change in the transaction. This causes the spilled files to be removed appropriately. Author: Atsushi Torikoshi Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp	2018-01-05 12:17:10 -03:00
Andrew Dunstan	87056267eb	Fix walsender timeouts when decoding a large transaction The logical slots have a fast code path for sending data so as not to impose too high a per message overhead. The fast path skips checks for interrupts and timeouts. However, the existing coding failed to consider the fact that a transaction with a large number of changes may take a very long time to be processed and sent to the client. This causes the walsender to ignore interrupts for potentially a long time and more importantly it will result in the walsender being killed due to timeout at the end of such a transaction. This commit changes the fast path to also check for interrupts and only allows calling the fast path when the last keepalive check happened less than half the walsender timeout ago. Otherwise the slower code path will be taken. Backpatched to 9.4 Petr Jelinek, reviewed by Kyotaro HORIGUCHI, Yura Sokolov, Craig Ringer and Robert Haas. Discussion: https://postgr.es/m/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com	2017-12-14 11:32:00 -05:00
Robert Haas	32022e3f55	Fix more user-visible elog() calls. Michael Paquier discovered that this could be triggered via SQL; give a nicer message instead. Patch by Michael Paquier, reviewed by Masahiko Sawada. Discussion: http://postgr.es/m/CAB7nPqQtPg+LKKtzdKN26judHcvPZ0s1gNigzOT4j8CYuuuBYg@mail.gmail.com	2017-10-05 08:32:48 -04:00
Alvaro Herrera	182abe3138	Fix coding rules violations in walreceiver.c 1. Since commit `b1a9bad9e7` we had pstrdup() inside a spinlock-protected critical section; reported by Andreas Seltenreich. Turn those into strlcpy() to stack-allocated variables instead. Backpatch to 9.6. 2. Since commit `9ed551e0a4` we had a pfree() uselessly inside a spinlock-protected critical section. Tom Lane noticed in code review. Move down. Backpatch to 9.6. 3. Since commit `64233902d2` we had GetCurrentTimestamp() (a kernel call) inside a spinlock-protected critical section. Tom Lane noticed in code review. Move it up. Backpatch to 9.2. 4. Since commit `1bb2558046` we did elog(PANIC) while holding spinlock. Tom Lane noticed in code review. Release spinlock before dying. Backpatch to 9.2. Discussion: https://postgr.es/m/87h8vhtgj2.fsf@ansel.ydns.eu	2017-10-03 14:58:25 +02:00
Andres Freund	869a5869e5	Fix thinko introduced in `2bef06d516` et al. The callers for GetOldestSafeDecodingTransactionId() all inverted the argument for the argument introduced in `2bef06d516`. Luckily this appears to be inconsequential for the moment, as we wait for concurrent in-progress transaction when assembling a snapshot. Additionally this could only make a difference when adding a second logical slot, because only a pre-existing slot could cause an issue by lowering the returned xid dangerously much. Reported-By: Antonin Houska Discussion: https://postgr.es/m/32704.1496993134@localhost Backport: 9.4-, where `2bef06d516` was backpatched to.	2017-08-06 14:21:22 -07:00
Heikki Linnakangas	9c32c29ee7	Fix ordering of operations in SyncRepWakeQueue to avoid assertion failure. Commit `14e8803f1` removed the locking in SyncRepWaitForLSN, but that introduced a race condition, where SyncRepWaitForLSN might see syncRepState already set to SYNC_REP_WAIT_COMPLETE, but the process was not yet removed from the queue. That tripped the assertion, that the process should no longer be in the uqeue. Reorder the operations in SyncRepWakeQueue to remove the process from the queue first, and update syncRepState only after that, and add a memory barrier in between to make sure the operations are made visible to other processes in that order. Fixes bug #14721 reported by Const Zhang. Analysis and fix by Thomas Munro. Backpatch down to 9.5, where the locking was removed. Discussion: https://www.postgresql.org/message-id/20170629023623.1480.26508%40wrigleys.postgresql.org	2017-07-12 15:35:36 +03:00
Tom Lane	446914f6b6	Fix walsender to exit promptly if client requests shutdown. It's possible for WalSndWaitForWal to be asked to wait for WAL that doesn't exist yet. That's fine, in fact it's the normal situation if we're caught up; but when the client requests shutdown we should not keep waiting. The previous coding could wait indefinitely if the source server was idle. In passing, improve the rather weak comments in this area, and slightly rearrange some related code for better readability. Back-patch to 9.4 where this code was introduced. Discussion: https://postgr.es/m/14154.1498781234@sss.pgh.pa.us	2017-06-30 12:00:03 -04:00
Andres Freund	1ba1adf772	Fix leaking of small spilled subtransactions during logical decoding. When, during logical decoding, a transaction gets too big, it's contents get spilled to disk. Not just the top-transaction gets spilled, but also all of its subtransactions, even if they're not that large themselves. Unfortunately we didn't clean up such small spilled subtransactions from disk. Fix that, by keeping better track of whether a transaction has been spilled to disk. Author: Andres Freund Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello Discussion: https://postgr.es/m/1457621358.355011041@f382.i.mail.ru https://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com Backpatch: 9.4-, where logical decoding was introduced	2017-06-18 19:13:50 -07:00
Tom Lane	c7e17ce4ee	Fix low-probability leaks of PGresult objects in the backend. We had three occurrences of essentially the same coding pattern wherein we tried to retrieve a query result from a libpq connection without blocking. In the case where PQconsumeInput failed (typically indicating a lost connection), all three loops simply gave up and returned, forgetting to clear any previously-collected PGresult object. Since those are malloc'd not palloc'd, the oversight results in a process-lifespan memory leak. One instance, in libpqwalreceiver, is of little significance because the walreceiver process would just quit anyway if its connection fails. But we might as well fix it. The other two instances, in postgres_fdw, are somewhat more worrisome because at least in principle the scenario could be repeated, allowing the amount of memory leaked to build up to something worth worrying about. Moreover, in these cases the loops contain CHECK_FOR_INTERRUPTS calls, as well as other calls that could potentially elog(ERROR), providing another way to exit without having cleared the PGresult. Here we need to add PG_TRY logic similar to what exists in quite a few other places in postgres_fdw. Coverity noted the libpqwalreceiver bug; I found the other two cases by checking all calls of PQconsumeInput. Back-patch to all supported versions as appropriate (9.2 lacks postgres_fdw, so this is really quite unexciting for that branch). Discussion: https://postgr.es/m/22620.1497486981@sss.pgh.pa.us	2017-06-15 15:03:55 -04:00
Andres Freund	641a60b026	Unify SIGHUP handling between normal and walsender backends. Because walsender and normal backends share the same main loop it's problematic to have two different flag variables, set in signal handlers, indicating a pending configuration reload. Only certain walsender commands reach code paths checking for the variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT ... LOGICAL, notably not base backups). This is a bug present since the introduction of walsender, but has gotten worse in releases since then which allow walsender to do more. A later patch, not slated for v10, will similarly unify SIGHUP handling in other types of processes as well. Author: Petr Jelinek, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de Backpatch: 9.2-, bug is present since 9.0	2017-06-05 19:18:16 -07:00
Andres Freund	50581f2e74	Prevent possibility of panics during shutdown checkpoint. When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and logical decoding related commands (e.g. via hint bits). So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, checkpointer, itself triggered by postmaster, sends a PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend is idle or runs an SQL query this causes the backend to shutdown, if logical replication is in progress all existing WAL records are processed followed by a shutdown. Otherwise this causes the walsender to switch to the "stopping" state. In this state, the walsender will reject any further replication commands. The checkpointer begins the shutdown checkpoint once all walsenders are confirmed as stopping. When the shutdown checkpoint finishes, the postmaster sends us SIGUSR2. This instructs walsender to send any outstanding WAL, including the shutdown checkpoint record, wait for it to be replicated to the standby, and then exit. Author: Andres Freund, based on an earlier patch by Michael Paquier Reported-By: Fujii Masao, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de Backpatch: 9.4, where logical decoding was introduced	2017-06-05 19:18:16 -07:00
Andres Freund	e1319f64ec	Have walsenders participate in procsignal infrastructure. The non-participation in procsignal was a problem for both changes in master, e.g. parallelism not working for normal statements run in walsender backends, and older branches, e.g. recovery conflicts and catchup interrupts not working for logical decoding walsenders. This commit thus replaces the previous WalSndXLogSendHandler with procsignal_sigusr1_handler. In branches since `db0f6cad48` that can lead to additional SetLatch calls, but that only rarely seems to make a difference. Author: Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de Backpatch: 9.4, earlier commits don't seem to benefit sufficiently	2017-06-05 19:18:16 -07:00
Peter Eisentraut	1460515618	Fix new warnings from GCC 7 This addresses the new warning types -Wformat-truncation -Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.	2017-05-16 08:43:55 -04:00
Andres Freund	fa9207c741	Avoid superfluous work for commits during logical slot creation. Before `955a684e04` logical decoding snapshot maintenance needed to cope with transactions it might not have seen in their entirety. For such transactions we'd to assume they modified the catalog (could have happened before we were watching), and thus a new snapshot had to be built, and distributed to concurrently running transactions. That's problematic because building a new snapshot isn't that cheap , especially as the the array of committed transactions needs to be sorted. When creating a slot on a server with a lot of transactions, this could make logical slot creation infeasibly expensive. After `955a684e04` there's no need to deal with transaction that aren't guaranteed to be fully observable. That allows to avoid building snapshots for transactions that haven't modified catalog, even before reaching consistency. While this isn't necessarily a bugfix, slot creation being impossible in some production workloads, is severe enough to warrant backpatching. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced	2017-05-13 15:06:40 -07:00
Andres Freund	b64a68e368	Fix race condition leading to hanging logical slot creation. The snapshot assembly during the creation of logical slots relied waiting for transactions in xl_running_xacts to end, by checking for their commit/abort records. Unfortunately, despite locking, it is possible to see an xl_running_xact record listing transactions as ready, that have already WAL-logged an commit/abort record, as the locking just prevents the ProcArray to be adjusted, and the commit record has to be logged first. That lead to either delayed or hanging snapshot creation, because snapbuild.c would wait "forever" to see commit/abort records for some transactions. That hang resolved only if a xl_running_xacts record without any running transactions happened to be logged, far from certain on a busy server. It's impractical to prevent that via more heavyweight locking, the likelihood of deadlocks and significantly increased contention would be too big. Instead change the initial snapshot creation to be solely based on tracking the oldest running transaction via xl_running_xacts->oldestRunningXid - that actually ends up significantly simplifying the code. That has two disadvantages: 1) Because we cannot fully "trust" the contents of xl_running_xacts, we cannot use it to build the initial snapshot. Instead we have to wait twice for all running transactions to finish. 2) Previously a slot, unless the race occurred, could be created when the all transaction perceived as running based on commit/abort records, now we have to wait for the next xl_running_xacts record. To address that, trigger logging new xl_running_xacts record from within snapbuild.c exactly when necessary. Unfortunately snabuild.c's SnapBuild is stored on disk, one of the stupider ideas of a certain Mr Freund, so we can't change it in a minor release. As this is going to be backpatched, we have to hack around a bit to keep on-disk compatibility. A later commit will rejigger that on master. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced	2017-05-13 14:21:00 -07:00
Andres Freund	54270d7ebc	Don't use on-disk snapshots for exported logical decoding snapshot. Logical decoding stores historical snapshots on disk, so that logical decoding can restart without having to reconstruct a snapshot from scratch (for which the resources are not guaranteed to be present anymore). These serialized snapshots were also used when creating a new slot via the walsender interface, which can export a "full" snapshot (i.e. one that can read all tables, not just catalog ones). The problem is that the serialized snapshots are only useful for catalogs and not for normal user tables. Thus the use of such a serialized snapshot could result in an inconsistent snapshot being exported, which could lead to queries returning wrong data. This would only happen if logical slots are created while another logical slot already exists. Author: Petr Jelinek Reviewed-By: Andres Freund Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.	2017-04-27 15:29:43 -07:00
Andres Freund	47f896b5c2	Preserve required !catalog tuples while computing initial decoding snapshot. The logical decoding machinery already preserved all the required catalog tuples, which is sufficient in the course of normal logical decoding, but did not guarantee that non-catalog tuples were preserved during computation of the initial snapshot when creating a slot over the replication protocol. This could cause a corrupted initial snapshot being exported. The time window for issues is usually not terribly large, but on a busy server it's perfectly possible to it hit it. Ongoing decoding is not affected by this bug. To avoid increased overhead for the SQL API, only retain additional tuples when a logical slot is being created over the replication protocol. To do so this commit changes the signature of CreateInitDecodingContext(), but it seems unlikely that it's being used in an extension, so that's probably ok. In a drive-by fix, fix handling of ReplicationSlotsComputeRequiredXmin's already_locked argument, which should only apply to ProcArrayLock, not ReplicationSlotControlLock. Reported-By: Erik Rijkers Analyzed-By: Petr Jelinek Author: Petr Jelinek, heavily editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.	2017-04-27 13:13:37 -07:00
Andres Freund	81ff04deda	Zero padding in replication origin's checkpointed on disk-state. This seems to be largely cosmetic, avoiding valgrind bleats and the like. The uninitialized padding influences the CRC of the on-disk entry, but because it's also used when verifying the CRC, that doesn't cause spurious failures. Backpatch nonetheless. It's a bit unfortunate that contrib/test_decoding/sql/replorigin.sql doesn't exercise the checkpoint path, but checkpoints are fairly expensive on weaker machines, and we'd have to stop/start for that to be meaningful. Author: Andres Freund Discussion: https://postgr.es/m/20170422183123.w2jgiuxtts7qrqaq@alap3.anarazel.de Backpatch: 9.5, where replication origins were introduced	2017-04-23 15:54:58 -07:00
Peter Eisentraut	5adec6b54d	Spelling fixes From: Josh Soref <jsoref@gmail.com>	2017-03-14 13:45:51 -04:00
Fujii Masao	feb659cced	Make walsender always initialize the buffers. Walsender uses the local buffers for each outgoing and incoming message. Previously when creating replication slot, walsender forgot to initialize one of them and which can cause the segmentation fault error. To fix this issue, this commit changes walsender so that it always initialize them before it executes the requested replication command. Back-patch to 9.4 where replication slot was introduced. Problem report and initial patch by Stas Kelvich, modified by me. Report: https://www.postgresql.org/message-id/A1E9CB90-1FAC-4CAD-8DBA-9AA62A6E97C5@postgrespro.ru	2017-02-22 08:29:44 +09:00
Heikki Linnakangas	3aee34d41d	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:34:18 +02:00
Simon Riggs	99289e5060	Reset hot standby xmin after restart Hot_standby_feedback could be reset by reload and worked correctly, but if the server was restarted rather than reloaded the xmin was not reset. Force reset always if hot_standby_feedback is enabled at startup. Ants Aasma, Craig Ringer Reported-by: Ants Aasma	2017-01-26 20:09:18 +00:00
Magnus Hagander	bc53d71308	Fix base backup rate limiting in presence of slow i/o When source i/o on disk was too slow compared to the rate limiting specified, the system could end up with a negative value for sleep that it never got out of, which caused rate limiting to effectively be turned off. Discussion: https://postgr.es/m/CABUevEy_-e0YvL4ayoX8bH_Ja9w%2BBHoP6jUgdxZuG2nEj3uAfQ%40mail.gmail.com Analysis by me, patch by Antonin Houska	2016-12-19 10:16:02 +01:00
Magnus Hagander	6d779e05a0	Fix handling of symlinked pg_stat_tmp and pg_replslot This was already fixed in HEAD as part of `6ad8ac60` but was not backpatched. Also change the way pg_xlog is handled to be the same as the other directories. Patch from me with pg_xlog addition from Michael Paquier, test updates from David Steele.	2016-11-07 15:03:56 +01:00
Tom Lane	008a7dc327	libpqwalreceiver needs to link with libintl when using --enable-nls. The need for this was previously obscured even on picky platforms by the hack we used to support direct cross-module references in the transforms contrib modules. Now that that hack is gone, the undefined symbol is exposed, as reported by Robert Haas. Back-patch to 9.5 where we started to use -Wl,-undefined,dynamic_lookup. I'm a bit surprised that the older branches don't seem to contain any gettext references in this module, but since they don't fail at build time, they must not. (We might be able to get away with leaving this alone in 9.5/9.6, but I think it's cleaner if the reference gets resolved at link time.) Report: <CA+TgmoaHJKU5kcWZcYduATYVT7Mnx+8jUnycaYYL7OtCwCigug@mail.gmail.com>	2016-10-07 21:12:25 -04:00
Andres Freund	ce603a34a4	Correct logical decoding restore behaviour for subtransactions. Before initializing iteration over a subtransaction's changes, the last few changes were not spilled to disk. That's correct if the transaction didn't spill to disk, but otherwise... This bug can lead to missed or misorderd subtransaction contents when they were spilled to disk. Move spilling of the remaining in-memory changes to ReorderBufferIterTXNInit(), where it can easily be applied to the top transaction and, if present, subtransactions. Since this code had too many bugs already, noticeably increase test coverage. Fixes: #14319 Reported-By: Huan Ruan Discussion: <20160909012610.20024.58169@wrigleys.postgresql.org> Backport: 9,4-, where logical decoding was added	2016-10-03 22:13:10 -07:00
Alvaro Herrera	ac3aac3292	reorderbuffer: preserve errno while reporting error Clobbering errno during cleanup after an error is an oft-repeated, easy to make mistake. Deal with it here as everywhere else, by saving it aside and restoring after cleanup, before ereport'ing. In passing, add a missing errcode declaration in another ereport() call in the same file, which I noticed while skimming the file looking for similar problems. Backpatch to 9.4, where this code was introduced.	2016-08-19 14:38:55 -03:00
Andres Freund	de396a1cb3	Properly re-initialize replication slot shared memory upon creation. Slot creation did not clear all fields upon creation. After start the memory is zeroed, but when a physical replication slot was created in the shared memory of a previously existing logical slot, catalog_xmin would not be cleared. That in turn would prevent vacuum from doing its duties. To fix initialize all the fields. To make similar future bugs less likely, zero all of ReplicationSlotPersistentData, and re-order the rest of the initialization to be in struct member order. Analysis: Andrew Gierth Reported-By: md@chewy.com Author: Michael Paquier Discussion: <20160705173502.1398.70934@wrigleys.postgresql.org> Backpatch: 9.4, where replication slots were introduced	2016-08-17 13:15:03 -07:00
Simon Riggs	4fc8e2315d	Code cleanup in SyncRepWaitForLSN() Commit `14e8803f1` removed LWLocks when accessing MyProc->syncRepState but didn't clean up the surrounding code and comments. Cleanup and backpatch to 9.5, to keep code similar. Julien Rouhaud, improved by suggestion from Michael Paquier, implemented trivially by myself.	2016-08-12 12:49:38 +01:00
Tom Lane	71dca408c0	Don't propagate a null subtransaction snapshot up to parent transaction. This oversight could cause logical decoding to fail to decode an outer transaction containing changes, if a subtransaction had an XID but no actual changes. Per bug #14279 from Marko Tiikkaja. Patch by Marko based on analysis by Andrew Gierth. Discussion: <20160804191757.1430.39011@wrigleys.postgresql.org>	2016-08-07 13:15:55 -04:00
Tom Lane	8caf9fe625	Fix typo in ReorderBufferIterTXNInit(). This looks like it would cause changes from subtransactions to be missed by the iterator being constructed, if those changes had been spilled to disk previously. This implies that large subtransactions might be lost (in whole or in part) by logical replication. Found and fixed by Petru-Florin Mihancea, per bug #14208. Report: <20160622144830.5791.22512@wrigleys.postgresql.org>	2016-06-30 12:37:02 -04:00
Andres Freund	2e1b4adf39	Remember asking for feedback during walsender shutdown. Since `5a991ef8` we're explicitly asking for feedback from the receiving side when shutting down walsender, if there's not yet replicated data. Unfortunately we didn't remember (i.e. set waiting_for_ping_response to true) having asked for feedback, leading to scenarios in which replies were requested at a high frequency. I can't reproduce this problem on my laptop, I think that's because the problem requires a significant TCP window to manifest due to the !pq_is_send_pending() condition. But since this clearly is a bug, let's fix it. There's quite possibly more wrong than just this though. While fiddling with WalSndDone(), I rewrote a hard to understand comment about looking at the flush vs. the write position. Reported-By: Nick Cleaton, Magnus Hagander Author: Nick Cleaton Discussion: CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com CABUevExsjROqDcD0A2rnJ6HK6FuKGyewJr3PL12pw85BHFGS2Q@mail.gmail.com Backpatch: 9.4, were `5a991ef8` introduced the use of feedback messages during shutdown.	2016-04-28 22:11:18 -07:00
Tom Lane	e7a456174b	Fix core dump in ReorderBufferRestoreChange on alignment-picky platforms. When re-reading an update involving both an old tuple and a new tuple from disk, reorderbuffer.c was careless about whether the new tuple is suitably aligned for direct access --- in general, it isn't. We'd missed seeing this in the buildfarm because the contrib/test_decoding tests exercise this code path only a few times, and by chance all of those cases have old tuples with length a multiple of 4, which is usually enough to make the access to the new tuple's t_len safe. For some still-not-entirely-clear reason, however, Debian's sparc build gets a bus error, as reported by Christoph Berg; perhaps it's assuming 8-byte alignment of the pointer? The lack of previous field reports is probably because you need all of these conditions to trigger a crash: an alignment-picky platform (not Intel), a transaction large enough to spill to disk, an update within that xact that changes a primary-key field and has an odd-length old tuple, and of course logical decoding tracing the transaction. Avoid the alignment assumption by using memcpy instead of fetching t_len directly, and add a test case that exposes the crash on picky platforms. Back-patch to 9.4 where the bug was introduced. Discussion: <20160413094117.GC21485@msg.credativ.de>	2016-04-14 19:42:22 -04:00
Tom Lane	ce47112b6a	Adjust datatype of ReplicationState.acquired_by. It was declared as "pid_t", which would be fine except that none of the places that printed it in error messages took any thought for the possibility that it's not equivalent to "int". This leads to warnings on some buildfarm members, and could possibly lead to actually wrong error messages on those platforms. There doesn't seem to be any very good reason not to just make it "int"; it's only ever assigned from MyProcPid, which is int. If we want to cope with PIDs that are wider than int, this is not the place to start. Also, fix the comment, which seems to perhaps be a leftover from a time when the field was only a bool? Per buildfarm. Back-patch to 9.5 which has same issue.	2016-04-14 12:18:09 -04:00
Alvaro Herrera	c6d2fa1ab0	Fix broken variable declaration Author: Konstantin Knizhnik	2016-03-30 23:40:31 -03:00
Andres Freund	301cc3549c	Avoid unlikely data-loss scenarios due to rename() without fsync. Renaming a file using rename(2) is not guaranteed to be durable in face of crashes. Use the previously added durable_rename()/durable_link_or_rename() in various places where we previously just renamed files. Most of the changed call sites are arguably not critical, but it seems better to err on the side of too much durability. The most prominent known case where the previously missing fsyncs could cause data loss is crashes at the end of a checkpoint. After the actual checkpoint has been performed, old WAL files are recycled. When they're filled, their contents are fdatasynced, but we did not fsync the containing directory. An OS/hardware crash in an unfortunate moment could then end up leaving that file with its old name, but new content; WAL replay would thus not replay it. Reported-By: Tomas Vondra Author: Michael Paquier, Tomas Vondra, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches	2016-03-09 18:53:53 -08:00
Andres Freund	63b06e8fa6	Introduce durable_rename() and durable_link_or_rename(). Renaming a file using rename(2) is not guaranteed to be durable in face of crashes; especially on filesystems like xfs and ext4 when mounted with data=writeback. To be certain that a rename() atomically replaces the previous file contents in the face of crashes and different filesystems, one has to fsync the old filename, rename the file, fsync the new filename, fsync the containing directory. This sequence is not generally adhered to currently; which exposes us to data loss risks. To avoid having to repeat this arduous sequence, introduce durable_rename(), which wraps all that. Also add durable_link_or_rename(). Several places use link() (with a fallback to rename()) to rename a file, trying to avoid replacing the target file out of paranoia. Some of those rename sequences need to be durable as well. There seems little reason extend several copies of the same logic, so centralize the link() callers. This commit does not yet make use of the new functions; they're used in a followup commit. Author: Michael Paquier, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches	2016-03-09 18:53:53 -08:00
Andres Freund	250e5bd712	Further improvements to `c8f621c43`. Coverity and inspection for the issue addressed in `fd45d16f` found some questionable code. Specifically coverity noticed that the wrong length was added in ReorderBufferSerializeChange() - without immediate negative consequences as the variable isn't used afterwards. During code-review and testing I noticed that a bit of space was wasted when allocating tuple bufs in several places. Thirdly, the debug memset()s in ReorderBufferGetTupleBuf() reduce the error checking valgrind can do. Backpatch: 9.4, like `c8f621c43`.	2016-03-07 14:24:52 -08:00
Andres Freund	5d1826fe76	Fix wrong allocation size in `c8f621c43`. In `c8f621c43` I forgot to account for MAXALIGN when allocating a new tuplebuf in ReorderBufferGetTupleBuf(). That happens to currently not cause active problems on a number of platforms because the affected pointer is already aligned, but others, like ppc and hppa, trigger this in the regression test, due to a debug memset clearing memory. Fix that. Backpatch: 9.4, like the previous commit.	2016-03-06 16:27:20 -08:00
Andres Freund	5990a034ad	logical decoding: Fix handling of large old tuples with replica identity full. When decoding the old version of an UPDATE or DELETE change, and if that tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or failed in more subtle ways in non-assert builds. Normally individual tuples aren't bigger than MaxHeapTupleSize, with big datums toasted. But that's not the case for the old version of a tuple for logical decoding; the replica identity is logged as one piece. With the default replica identity btree limits that to small tuples, but that's not the case for FULL. Change the tuple buffer infrastructure to separate allocate over-large tuples, instead of always going through the slab cache. This unfortunately requires changing the ReorderBufferTupleBuf definition, we need to store the allocated size someplace. To avoid requiring output plugins to recompile, don't store HeapTupleHeaderData directly after HeapTupleData, but point to it via t_data; that leaves rooms for the allocated size. As there's no reason for an output plugin to look at ReorderBufferTupleBuf->t_data.header, remove the field. It was just a minor convenience having it directly accessible. Reported-By: Adam Dratwiński Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com	2016-03-05 18:02:20 -08:00
Andres Freund	e76e365be9	logical decoding: old/newtuple in spooled UPDATE changes was switched around. Somehow I managed to flip the order of restoring old & new tuples when de-spooling a change in a large transaction from disk. This happens to only take effect when a change is spooled to disk which has old/new versions of the tuple. That only is the case for UPDATEs where he primary key changed or where replica identity is changed to FULL. The tests didn't catch this because either spooled updates, or updates that changed primary keys, were tested; not both at the same time. Found while adding tests for the following commit. Backpatch: 9.4, where logical decoding was added	2016-03-05 18:02:20 -08:00
Andres Freund	6e759cefe8	logical decoding: Tell reorderbuffer about all xids. Logical decoding's reorderbuffer keeps transactions in an LSN ordered list for efficiency. To make that's efficiently possible upper-level xids are forced to be logged before nested subtransaction xids. That only works though if these records are all looked at: Unfortunately we didn't do so for e.g. row level locks, which are otherwise uninteresting for logical decoding. This could lead to errors like: "ERROR: subxact logged without previous toplevel record". It's not sufficient to just look at row locking records, the xid could appear first due to a lot of other types of records (which will trigger the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny). So invent infrastructure to tell reorderbuffer about xids seen, when they'd otherwise not pass through reorderbuffer.c. Reported-By: Jarred Ward Bug: #13844 Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org Backpatch: 9.4, where logical decoding was added	2016-03-05 18:02:20 -08:00
Andres Freund	f8a75881f9	logical decoding: fix decoding of a commit's commit time. When adding replication origins in `5aa235042`, I somehow managed to set the timestamp of decoded transactions to InvalidXLogRecptr when decoding one made without a replication origin. Fix that, and the wrong type of the new commit_time variable. This didn't trigger a regression test failure because we explicitly don't show commit timestamps in the regression tests, as they obviously are variable. Add a test that checks that a decoded commit's timestamp is within minutes of NOW() from before the commit. Reported-By: Weiping Qu Diagnosed-By: Artur Zakirov Discussion: 56D4197E.9050706@informatik.uni-kl.de, 56D42918.1010108@postgrespro.ru Backpatch: 9.5, where `5aa235042` originates.	2016-03-02 23:43:42 -08:00
Alvaro Herrera	9ca3789191	Fix typos Author: Amit Langote	2016-02-29 18:11:58 -03:00
Tom Lane	aa7e04cb56	Clean up some lack-of-STRICT issues in the core code, too. A scan for missed proisstrict markings in the core code turned up these functions: brin_summarize_new_values pg_stat_reset_single_table_counters pg_stat_reset_single_function_counters pg_create_logical_replication_slot pg_create_physical_replication_slot pg_drop_replication_slot The first three of these take OID, so a null argument will normally look like a zero to them, resulting in "ERROR: could not open relation with OID 0" for brin_summarize_new_values, and no action for the pg_stat_reset_XXX functions. The other three will dump core on a null argument, though this is mitigated by the fact that they won't do so until after checking that the caller is superuser or has rolreplication privilege. In addition, the pg_logical_slot_get/peek[_binary]_changes family was intentionally marked nonstrict, but failed to make nullness checks on all the arguments; so again a null-pointer-dereference crash is possible but only for superusers and rolreplication users. Add the missing ARGISNULL checks to the latter functions, and mark the former functions as strict in pg_proc. Make that change in the back branches too, even though we can't force initdb there, just so that installations initdb'd in future won't have the issue. Since none of these bugs rise to the level of security issues (and indeed the pg_stat_reset_XXX functions hardly misbehave at all), it seems sufficient to do this. In addition, fix some order-of-operations oddities in the slot_get_changes family, mostly cosmetic, but not the part that moves the function's last few operations into the PG_TRY block. As it stood, there was significant risk for an error to exit without clearing historical information from the system caches. The slot_get_changes bugs go back to 9.4 where that code was introduced. Back-patch appropriate subsets of the pg_proc changes into all active branches, as well.	2016-01-09 16:58:32 -05:00
Robert Haas	550e9c2305	Fix copy-and-paste error in logical decoding callback. This could result in the error context misidentifying where the error actually occurred. Craig Ringer	2015-12-18 12:22:24 -05:00
Magnus Hagander	28c366789e	Consistently set all fields in pg_stat_replication to null instead of 0 Previously the "sent" field would be set to 0 and all other xlog pointers be set to NULL if there were no valid values (such as when in a backup sending walsender).	2015-12-13 16:54:39 +01:00
Magnus Hagander	a9c56ff0e1	Properly initialize write, flush and replay locations in walsender slots These would leak random xlog positions if a walsender used for backup would a walsender slot previously used by a replication walsender. In passing also fix a couple of cases where the xlog pointer is directly compared to zero instead of using XLogRecPtrIsInvalid, noted by Michael Paquier.	2015-12-13 16:43:35 +01:00

1 2 3 4 5 ...

415 commits