postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-02-24 10:25:42 -05:00

Author	SHA1	Message	Date
Heikki Linnakangas	68a2e52bba	Replace the XLogInsert slots with regular LWLocks. The special feature the XLogInsert slots had over regular LWLocks is the insertingAt value that was updated atomically with releasing backends waiting on it. Add new functions to the LWLock API to do that, and replace the slots with LWLocks. This reduces the amount of duplicated code. (There's still some duplication, but at least it's all in lwlock.c now.) Reviewed by Andres Freund.	2014-03-21 15:10:48 +01:00
Alvaro Herrera	f88d4cfc9d	Setup error context callback for transaction lock waits With this in place, a session blocking behind another one because of tuple locks will get a context line mentioning the relation name, tuple TID, and operation being done on tuple. For example: LOG: process 11367 still waiting for ShareLock on transaction 717 after 1000.108 ms DETAIL: Process holding the lock: 11366. Wait queue: 11367. CONTEXT: while updating tuple (0,2) in relation "foo" STATEMENT: UPDATE foo SET value = 3; Most usefully, the new line is displayed by log entries due to log_lock_waits, although of course it will be printed by any other log message as well. Author: Christian Kruse, some tweaks by Álvaro Herrera Reviewed-by: Amit Kapila, Andres Freund, Tom Lane, Robert Haas	2014-03-19 15:10:36 -03:00
Robert Haas	3bd261ca18	Improve shm_mq portability around MAXIMUM_ALIGNOF and sizeof(Size). Revise the original decision to expose a uint64-based interface and use Size everywhere possible. Avoid assuming that MAXIMUM_ALIGNOF is 8, or making any assumption about the relationship between that value and sizeof(Size). If MAXIMUM_ALIGNOF is bigger, we'll now insert padding after the length word; if it's smaller, we are now prepared to read and write the length word in chunks. Per discussion with Tom Lane.	2014-03-18 11:23:13 -04:00
Robert Haas	79a4d24f31	Make it easy to detach completely from shared memory. The new function dsm_detach_all() can be used either by postmaster children that don't wish to take any risk of accidentally corrupting shared memory; or by forked children of regular backends with the same need. This patch also updates the postmaster children that already do PGSharedMemoryDetach() to do dsm_detach_all() as well. Per discussion with Tom Lane.	2014-03-18 07:58:53 -04:00
Robert Haas	8722017bbc	Allow dynamic shared memory segments to be kept until shutdown. Amit Kapila, reviewed by Kyotaro Horiguchi, with some further changes by me.	2014-03-10 14:04:47 -04:00
Robert Haas	cb9a0c7987	Teach on_exit_reset() to discard pending cleanups for dsm. If a postmaster child invokes fork() and then calls on_exit_reset, that should be sufficient to let it exit() without breaking anything, but dynamic shared memory broke that by not updating on_exit_reset() to discard callbacks registered with dynamic shared memory segments. Per investigation of a complaint from Tom Lane.	2014-03-10 10:17:19 -04:00
Bruce Momjian	5024044a20	C comments: improve description of relfilenode uniqueness Report by Antonin Houska	2014-03-08 12:20:30 -05:00
Heikki Linnakangas	55566c9a74	Fix dangling smgr_owner pointer when a fake relcache entry is freed. A fake relcache entry can "own" a SmgrRelation object, like a regular relcache entry. But when it was free'd, the owner field in SmgrRelation was not cleared, so it was left pointing to free'd memory. Amazingly this apparently hasn't caused crashes in practice, or we would've heard about it earlier. Andres found this with Valgrind. Report and fix by Andres Freund, with minor modifications by me. Backpatch to all supported versions.	2014-03-07 13:28:52 +02:00
Bruce Momjian	0024a3a3b6	C comment update: relfilenode is only unique with a tablespace Report from Antonin Houska	2014-03-05 20:52:34 -05:00
Robert Haas	b89e151054	Introduce logical decoding. This feature, building on previous commits, allows the write-ahead log stream to be decoded into a series of logical changes; that is, inserts, updates, and deletes and the transactions which contain them. It is capable of handling decoding even across changes to the schema of the effected tables. The output format is controlled by a so-called "output plugin"; an example is included. To make use of this in a real replication system, the output plugin will need to be modified to produce output in the format appropriate to that system, and to perform filtering. Currently, information can be extracted from the logical decoding system only via SQL; future commits will add the ability to stream changes via walsender. Andres Freund, with review and other contributions from many other people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan, Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve Singer.	2014-03-03 16:32:18 -05:00
Heikki Linnakangas	f8ce16d0d2	Rename huge_tlb_pages to huge_pages, and improve docs. Christian Kruse	2014-03-03 20:52:48 +02:00
Robert Haas	dd1a3bccca	Show xid and xmin in pg_stat_activity and pg_stat_replication. Christian Kruse, reviewed by Andres Freund and myself, with further minor adjustments by me.	2014-02-25 12:34:04 -05:00
Tom Lane	fa1f0d7859	PGDLLIMPORT-ify MainLWLockArray, ProcDiePending, proc_exit_inprogress. These are needed in HEAD to make assorted contrib modules build on Windows. Now that all the MSVC and Mingw buildfarm members seem to be on the same page about the need for them, we can have some confidence that future problems of this ilk will be detected promptly; there seems nothing more to be learned by delaying this fix further. I chose to mark QueryCancelPending as well, since it's easy to imagine code that wants to touch ProcDiePending also caring about QueryCancelPending.	2014-02-16 20:12:43 -05:00
Peter Eisentraut	66c04c981d	Mark some more variables as static or include the appropriate header Detected by clang's -Wmissing-variable-declarations. From: Andres Freund <andres@anarazel.de>	2014-02-08 21:21:46 -05:00
Robert Haas	858ec11858	Introduce replication slots. Replication slots are a crash-safe data structure which can be created on either a master or a standby to prevent premature removal of write-ahead log segments needed by a standby, as well as (with hot_standby_feedback=on) pruning of tuples whose removal would cause replication conflicts. Slots have some advantages over existing techniques, as explained in the documentation. In a few places, we refer to the type of replication slots introduced by this patch as "physical" slots, because forthcoming patches for logical decoding will also have slots, but with somewhat different properties. Andres Freund and Robert Haas	2014-01-31 22:45:36 -05:00
Heikki Linnakangas	1a3458b6d8	Allow using huge TLB pages on Linux (MAP_HUGETLB) This patch adds an option, huge_tlb_pages, which allows requesting the shared memory segment to be allocated using huge pages, by using the MAP_HUGETLB flag in mmap(). This can improve performance. The default is 'try', which means that we will attempt using huge pages, and fall back to non-huge pages if it doesn't work. Currently, only Linux has MAP_HUGETLB. On other platforms, the default 'try' behaves the same as 'off'. In the passing, don't try to round the mmap() size to a multiple of pagesize. mmap() doesn't require that, and there's no particular reason for PostgreSQL to do that either. When using MAP_HUGETLB, however, round the request size up to nearest 2MB boundary. This is to work around a bug in some Linux kernel versions, but also to avoid wasting memory, because the kernel will round the size up anyway. Many people were involved in writing this patch, including Christian Kruse, Richard Poole, Abhijit Menon-Sen, reviewed by Peter Geoghegan, Andres Freund and me.	2014-01-29 14:08:30 +02:00
Robert Haas	ea9df812d8	Relax the requirement that all lwlocks be stored in a single array. This makes it possible to store lwlocks as part of some other data structure in the main shared memory segment, or in a dynamic shared memory segment. There is still a main LWLock array and this patch does not move anything out of it, but it provides necessary infrastructure for doing that in the future. This change is likely to increase the size of LWLockPadded on some platforms, especially 32-bit platforms where it was previously only 16 bytes. Patch by me. Review by Andres Freund and KaiGai Kohei.	2014-01-27 11:07:44 -05:00
Andrew Dunstan	7d7eee8bb7	Export a few more symbols required for test_shm_mq module. Patch from Amit Kapila.	2014-01-18 15:29:45 -05:00
Andrew Dunstan	708c529c7f	Export set_latch_on_sigusr1 symbol for Windows. Per buildfarm currawong and grip from David Rowley.	2014-01-17 12:48:23 -05:00
Robert Haas	ed46758381	Logging running transactions every 15 seconds. Previously, we did this just once per checkpoint, but that could make Hot Standby take a long time to initialize. To avoid busying an otherwise-idle system, we don't do this if no WAL has been written since we did it last. Andres Freund	2014-01-15 12:41:20 -05:00
Tom Lane	061b079f89	Fix multiple bugs in index page locking during hot-standby WAL replay. In ordinary operation, VACUUM must be careful to take a cleanup lock on each leaf page of a btree index; this ensures that no indexscans could still be "in flight" to heap tuples due to be deleted. (Because of possible index-tuple motion due to concurrent page splits, it's not enough to lock only the pages we're deleting index tuples from.) In Hot Standby, the WAL replay process must likewise lock every leaf page. There were several bugs in the code for that: * The replay scan might come across unused, all-zero pages in the index. While btree_xlog_vacuum itself did the right thing (ie, nothing) with such pages, xlogutils.c supposed that such pages must be corrupt and would throw an error. This accounts for various reports of replication failures with "PANIC: WAL contains references to invalid pages". To fix, add a ReadBufferMode value that instructs XLogReadBufferExtended not to complain when we're doing this. * btree_xlog_vacuum performed the extra locking if standbyState == STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up for hot standby queries until the database has reached consistency, and we don't want to do the extra locking till then either, for fear of reading corrupted pages (which bufmgr.c would complain about). Fix by exporting a new function from xlog.c that will report whether we're actually in hot standby replay mode. * To ensure full coverage of the index in the replay scan, btvacuumscan would emit a dummy WAL record for the last page of the index, if no vacuuming work had been done on that page. However, if the last page of the index is all-zero, that would result in corruption of said page, since the functions called on it weren't prepared to handle that case. There's no need to lock any such pages, so change the logic to target the last normal leaf page instead. The first two of these bugs were diagnosed by Andres Freund, the other one by me. Fixes based on ideas from Heikki Linnakangas and myself. This has been wrong since Hot Standby was introduced, so back-patch to 9.0.	2014-01-14 17:35:21 -05:00
Robert Haas	ec9037df26	Single-reader, single-writer, lightweight shared message queue. This code provides infrastructure for user backends to communicate relatively easily with background workers. The message queue is structured as a ring buffer and allows messages of arbitary length to be sent and received. Patch by me. Review by KaiGai Kohei and Andres Freund.	2014-01-14 12:23:22 -05:00
Robert Haas	6ddd5137b2	Simple table of contents for a shared memory segment. This interface is intended to make it simple to divide a dynamic shared memory segment into different regions with distinct purposes. It therefore serves much the same purpose that ShmemIndex accomplishes for the main shared memory segment, but it is intended to be more lightweight. Patch by me. Review by Andres Freund.	2014-01-14 12:18:58 -05:00
Tom Lane	220b34331f	We don't need to include pg_sema.h in s_lock.h anymore. Minor improvement to commit `daa7527afc`: s_lock.h no longer has any need to mention PGSemaphoreData, so we can rip out the #include that supplies that. In a non-HAVE_SPINLOCKS build, this doesn't really buy much since we still need the #include in spin.h --- but everywhere else, this reduces #include footprint by some trifle, and helps keep the different locking facilities separate.	2014-01-08 20:58:22 -05:00
Robert Haas	daa7527afc	Reduce the number of semaphores used under --disable-spinlocks. Instead of allocating a semaphore from the operating system for every spinlock, allocate a fixed number of semaphores (by default, 1024) from the operating system and multiplex all the spinlocks that get created onto them. This could self-deadlock if a process attempted to acquire more than one spinlock at a time, but since processes aren't supposed to execute anything other than short stretches of straight-line code while holding a spinlock, that shouldn't happen. One motivation for this change is that, with the introduction of dynamic shared memory, it may be desirable to create spinlocks that last for less than the lifetime of the server. Without this change, attempting to use such facilities under --disable-spinlocks would quickly exhaust any supply of available semaphores. Quite apart from that, it's desirable to contain the quantity of semaphores needed to run the server simply on convenience grounds, since using too many may make it harder to get PostgreSQL running on a new platform, which is mostly the point of --disable-spinlocks in the first place. Patch by me; review by Tom Lane.	2014-01-08 18:58:00 -05:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Robert Haas	001a573a20	Allow on-detach callbacks for dynamic shared memory segments. Just as backends must clean up their shared memory state (releasing lwlocks, buffer pins, etc.) before exiting, they must also perform any similar cleanups related to dynamic shared memory segments they have mapped before unmapping those segments. So add a mechanism to ensure that. Existing on_shmem_exit hooks include both "user level" cleanup such as transaction abort and removal of leftover temporary relations and also "low level" cleanup that forcibly released leftover shared memory resources. On-detach callbacks should run after the first group but before the second group, so create a new before_shmem_exit function for registering the early callbacks and keep on_shmem_exit for the regular callbacks. (An earlier draft of this patch added an additional argument to on_shmem_exit, but that had a much larger footprint and probably a substantially higher risk of breaking third party code for no real gain.) Patch by me, reviewed by KaiGai Kohei and Andres Freund.	2013-12-18 13:09:09 -05:00
Tatsuo Ishii	65d6e4cb5c	Add ALTER SYSTEM command to edit the server configuration file. Patch contributed by Amit Kapila. Reviewed by Hari Babu, Masao Fujii, Boszormenyi Zoltan, Andres Freund, Greg Smith and others.	2013-12-18 23:42:44 +09:00
Tom Lane	7db285afc9	Fix stale-pointer problem in fast-path locking logic. When acquiring a lock in fast-path mode, we must reset the locallock object's lock and proclock fields to NULL. They are not necessarily that way to start with, because the locallock could be left over from a failed lock acquisition attempt earlier in the transaction. Failure to do this led to all sorts of interesting misbehaviors when LockRelease tried to clean up no-longer-related lock and proclock objects in shared memory. Per report from Dan Wood. In passing, modify LockRelease to elog not just Assert if it doesn't find lock and proclock objects for a formerly fast-path lock, matching the code in FastPathGetRelationLockEntry and LockRefindAndRelease. This isn't a bug but it will help in diagnosing any future bugs in this area. Also, modify FastPathTransferRelationLocks and FastPathGetRelationLockEntry to break out of their loops over the fastpath array once they've found the sole matching entry. This was inconsistently done in some search loops and not others. Improve assorted related comments, too. Back-patch to 9.2 where the fast-path mechanism was introduced.	2013-11-27 18:10:00 -05:00
Robert Haas	d2aecaea15	Modify dynamic shared memory code to use Size rather than uint64. This is more consistent with what we do elsewhere.	2013-10-28 12:12:06 -04:00
Robert Haas	ea91a6be89	Remove IRIX port. Development of IRIX has been discontinued, and support is scheduled to end in December of 2013. Therefore, there will be no supported versions of this operating system by the time PostgreSQL 9.4 is released. Furthermore, we have no maintainer for this platform.	2013-10-18 08:14:21 -04:00
Robert Haas	81051a86bc	Remove spinlock support for SINIX, Sun3, and NS32K. All of these platforms are very much obsolete. As far as I can determine, the last version of SINIX, later renamed Reliant, occurred some time between 2002 and 2005. The last release of SunOS that would run on a sun3 was released in November of 1991; the last release of OpenBSD which supported that platform was in 2001. The highest clock speed of any processor in the family was 25MHz. The NS32K (national semiconductor 320xx) architecture was retired in 1990. Support can be re-added if a maintainer emerges for any of these platforms, but it seems unlikely. Reviewed by Andres Freund.	2013-10-17 12:02:05 -04:00
Robert Haas	0ac5e5a7e1	Allow dynamic allocation of shared memory segments. Patch by myself and Amit Kapila. Design help from Noah Misch. Review by Andres Freund.	2013-10-09 21:05:02 -04:00
Kevin Grittner	c01262a824	Eliminate xmin from hash tag for predicate locks on heap tuples. If a tuple was frozen while its predicate locks mattered, read-write dependencies could be missed, resulting in failure to detect conflicts which could lead to anomalies in committed serializable transactions. This field was added to the tag when we still thought that it was necessary to carry locks forward to a new version of an updated row. That was later proven to be unnecessary, which allowed simplification of the code, but elimination of xmin from the tag was missed at the time. Per report and analysis by Heikki Linnakangas. Backpatch to 9.1.	2013-10-07 14:16:54 -05:00
Alvaro Herrera	15732b34e8	Add WaitForLockers in lmgr, refactoring index.c code This is in support of a future REINDEX CONCURRENTLY feature. Michael Paquier	2013-10-01 17:57:01 -03:00
Alvaro Herrera	1247ea28cb	Remove `proc` argument from LockCheckConflicts This has been unused since commit `8563ccae2c`. Noted by Antonin Houska	2013-09-16 22:14:14 -03:00
Robert Haas	cc52d5b33f	Expose fsync_fname as a public API. Andres Freund	2013-09-04 11:15:00 -04:00
Alvaro Herrera	8b290f3115	Update obsolete comment	2013-09-03 16:53:16 -04:00
Heikki Linnakangas	b03d196be0	Use a non-locking initial test in TAS_SPIN on x86_64. Testing done in 2011 by Tom Lane concluded that this is a win on Intel Xeons and AMD Opterons, but it was not changed back then, because of an old comment in tas() that suggested that it's a huge loss on older Opterons. However, didn't have separate TAS() and TAS_SPIN() macros back then, so the comment referred to doing a non-locked initial test even on the first access, in uncontended case. I don't have access to older Opterons, but I'm pretty sure that doing an initial unlocked test is unlikely to be a loss while spinning, even though it might be for the first access. We probably should do the same on 32-bit x86, but I'm afraid of changing it without any testing. Hence just add a note to the x86 implementation suggesting that we probably should do the same there.	2013-08-29 14:04:37 +03:00
Robert Haas	090d0f2050	Allow discovery of whether a dynamic background worker is running. Using the infrastructure provided by this patch, it's possible either to wait for the startup of a dynamically-registered background worker, or to poll the status of such a worker without waiting. In either case, the current PID of the worker process can also be obtained. As usual, worker_spi is updated to demonstrate the new functionality. Patch by me. Review by Andres Freund.	2013-08-28 14:08:13 -04:00
Tom Lane	89779bf2c8	Fix a few problems in barrier.h. On HPPA, implement pg_memory_barrier() as pg_compiler_barrier(), which should be correct since this arch doesn't do memory access reordering, and is anyway better than the completely-nonfunctional-on-this-arch dummy_spinlock code. (But note this patch only fixes things for gcc, not for builds with HP's compiler.) Also, fix incorrect default definition of pg_memory_barrier as a macro requiring an argument. Also, fix incorrect spelling of "#elif" as "#else if" in icc code path (spotted by pgindent). This doesn't come close to fixing all of the functional and stylistic deficiencies in barrier.h, but at least it un-breaks my personal build. Now that we're actually using barriers in the code, this file is going to need some serious attention.	2013-07-17 18:38:20 -04:00
Robert Haas	7f7485a0cd	Allow background workers to be started dynamically. There is a new API, RegisterDynamicBackgroundWorker, which allows an ordinary user backend to register a new background writer during normal running. This means that it's no longer necessary for all background workers to be registered during processing of shared_preload_libraries, although the option of registering workers at that time remains available. When a background worker exits and will not be restarted, the slot previously used by that background worker is automatically released and becomes available for reuse. Slots used by background workers that are configured for automatic restart can't (yet) be released without shutting down the system. This commit adds a new source file, bgworker.c, and moves some of the existing control logic for background workers there. Previously, there was little enough logic that it made sense to keep everything in postmaster.c, but not any more. This commit also makes the worker_spi contrib module into an extension and adds a new function, worker_spi_launch, which can be used to demonstrate the new facility.	2013-07-16 13:02:15 -04:00
Heikki Linnakangas	e5592c61ad	Fix memory barrier support on icc on ia64, 2nd attempt. Itanium doesn't have the mfence instruction - that's a 386 thing. Use the "mf" instruction instead. This reverts the previous commit to add "#include <emmintrinsic.h>"; the problem was not with a missing #include.	2013-07-09 11:34:18 +03:00
Heikki Linnakangas	6052bceba5	Add #include needed for _mm_mfence() intrinsic on ia64. Hopefully this fixes the build failure on buildfarm member dugong.	2013-07-09 10:29:43 +03:00
Heikki Linnakangas	9a20a9b21b	Improve scalability of WAL insertions. This patch replaces WALInsertLock with a number of WAL insertion slots, allowing multiple backends to insert WAL records to the WAL buffers concurrently. This is particularly useful for parallel loading large amounts of data on a system with many CPUs. This has one user-visible change: switching to a new WAL segment with pg_switch_xlog() now fills the remaining unused portion of the segment with zeros. This potentially adds some overhead, but it has been a very common practice by DBA's to clear the "tail" of the segment with an external pg_clearxlogtail utility anyway, to make the WAL files compress better. With this patch, it's no longer necessary to do that. This patch adds a new GUC, xloginsert_slots, to tune the number of WAL insertion slots. Performance testing suggests that the default, 8, works pretty well for all kinds of worklods, but I left the GUC in place to allow others with different hardware to test that easily. We might want to remove that before release. Reviewed by Andres Freund.	2013-07-08 11:23:56 +03:00
Robert Haas	568d4138c6	Use an MVCC snapshot, rather than SnapshotNow, for catalog scans. SnapshotNow scans have the undesirable property that, in the face of concurrent updates, the scan can fail to see either the old or the new versions of the row. In many cases, we work around this by requiring DDL operations to hold AccessExclusiveLock on the object being modified; in some cases, the existing locking is inadequate and random failures occur as a result. This commit doesn't change anything related to locking, but will hopefully pave the way to allowing lock strength reductions in the future. The major issue has held us back from making this change in the past is that taking an MVCC snapshot is significantly more expensive than using a static special snapshot such as SnapshotNow. However, testing of various worst-case scenarios reveals that this problem is not severe except under fairly extreme workloads. To mitigate those problems, we avoid retaking the MVCC snapshot for each new scan; instead, we take a new snapshot only when invalidation messages have been processed. The catcache machinery already requires that invalidation messages be sent before releasing the related heavyweight lock; else other backends might rely on locally-cached data rather than scanning the catalog at all. Thus, making snapshot reuse dependent on the same guarantees shouldn't break anything that wasn't already subtly broken. Patch by me. Review by Michael Paquier and Andres Freund.	2013-07-02 09:47:01 -04:00
Peter Eisentraut	129759d6a5	Fix cpluspluscheck in checksum code C++ is more picky about comparing signed and unsigned integers.	2013-06-30 10:25:43 -04:00
Simon Riggs	1f09121b4e	Ensure no xid gaps during Hot Standby startup In some cases with higher numbers of subtransactions it was possible for us to incorrectly initialize subtrans leading to complaints of missing pages. Bug report by Sergey Konoplev Analysis and fix by Andres Freund	2013-06-23 11:05:02 +01:00
Jeff Davis	b8fd1a09f3	Add buffer_std flag to MarkBufferDirtyHint(). MarkBufferDirtyHint() writes WAL, and should know if it's got a standard buffer or not. Currently, the only callers where buffer_std is false are related to the FSM. In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive. Back-patch to 9.3.	2013-06-17 08:02:12 -07:00
Tom Lane	f04216341d	Refactor checksumming code to make it easier to use externally. pg_filedump and other external utility programs are likely to want to be able to check Postgres page checksums. To avoid messy duplication of code, move the checksumming functionality into an exported header file, much as we did awhile back for the CRC code. In passing, get rid of an unportable assumption that a static char[] array will be word-aligned, and do some other minor code beautification.	2013-06-13 22:35:56 -04:00

1 2 3 4 5 ...

864 commits