This could result in corruption of the init fork of an unlogged index
if the ambuildempty routine for that index used shared buffers to
create the init fork, which was true for brin, gin, gist, and hash
indexes.
Patch by me, based on an earlier patch by Michael Paquier, who also
reviewed this one. This also incorporates an idea from Artur
Zakirov.
Discussion: http://postgr.es/m/CACYUyc8yccE4xfxhqxfh_Mh38j7dRFuxfaK1p6dSNAEUakxUyQ@mail.gmail.com
My back-patch of the 9.4-era commit 44cd47c1d4 into 9.2 and 9.3 fixed
HPPA builds as expected, but it broke --disable-spinlocks builds, because
the dummy spinlock is initialized before the underlying semaphore
infrastructure is alive. In 9.4 and up this works because of commit
daa7527afc, which decoupled initialization of an slock_t variable
from access to the actual system semaphore object. The best solution
seems to be to back-port that patch, which should be a net win anyway
because it improves the usability of --disable-spinlocks builds in the
older branches; and it's been out long enough now to not be worrisome
from a stability perspective.
We weren't particularly maintaining barrier.h before 9.4, because nothing
was using it in those branches. Well, nothing until commit 37de8de9e got
back-patched. That broke 9.2 and 9.3 for some non-mainstream platforms
that we haven't been testing in the buildfarm, including icc on ia64,
HPPA, and Alpha.
This commit effectively back-patches commits e5592c61a, 89779bf2c,
and 747ca6697, though I did it just by copying the file (less copyright
date updates) rather than by cherry-picking those commits.
Per an attempt to run gaur and pademelon over old branches they've
not been run on since ~2013.
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes; especially on filesystems like xfs and ext4 when mounted
with data=writeback. To be certain that a rename() atomically replaces
the previous file contents in the face of crashes and different
filesystems, one has to fsync the old filename, rename the file, fsync
the new filename, fsync the containing directory. This sequence is not
generally adhered to currently; which exposes us to data loss risks. To
avoid having to repeat this arduous sequence, introduce
durable_rename(), which wraps all that.
Also add durable_link_or_rename(). Several places use link() (with a
fallback to rename()) to rename a file, trying to avoid replacing the
target file out of paranoia. Some of those rename sequences need to be
durable as well. There seems little reason extend several copies of the
same logic, so centralize the link() callers.
This commit does not yet make use of the new functions; they're used in
a followup commit.
Author: Michael Paquier, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
At the end of crash recovery, unlogged relations are reset to the empty
state, using their init fork as the template. The init fork is copied to
the main fork without going through shared buffers. Unfortunately WAL
replay so far has not necessarily flushed writes from shared buffers to
disk at that point. In normal crash recovery, and before the
introduction of 'fast promotions' in fd4ced523 / 9.3, the
END_OF_RECOVERY checkpoint flushes the buffers out in time. But with
fast promotions that's not the case anymore.
To fix, force WAL writes targeting the init fork to be flushed
immediately (using the new FlushOneBuffer() function). In 9.5+ that
flush can centrally be triggered from the code dealing with restoring
full page writes (XLogReadBufferForRedoExtended), in earlier releases
that responsibility is in the hands of XLOG_HEAP_NEWPAGE's replay
function.
Backpatch to 9.1, even if this currently is only known to trigger in
9.3+. Flushing earlier is more robust, and it is advantageous to keep
the branches similar.
Typical symptoms of this bug are errors like
'ERROR: index "..." contains unexpected zero page at block 0'
shortly after promoting a node.
Reported-By: Thom Brown
Author: Andres Freund and Michael Paquier
Discussion: 20150326175024.GJ451@alap3.anarazel.de
Backpatch: 9.1-
Postmaster child processes that aren't supposed to be attached to shared
memory were not bothering to close the shared memory mapping handle they
inherit from the postmaster process. That's mostly harmless, since the
handle vanishes anyway when the child process exits -- but the syslogger
process, if used, doesn't get killed and restarted during recovery from a
backend crash. That meant that Windows doesn't see the shared memory
mapping as becoming free, so it doesn't delete it and the postmaster is
unable to create a new one, resulting in failure to recover from crashes
whenever logging_collector is turned on.
Per report from Dmitry Vasilyev. It's a bit astonishing that we'd not
figured this out long ago, since it's been broken from the very beginnings
of out native Windows support; probably some previously-unexplained trouble
reports trace to this.
A secondary problem is that on Cygwin (perhaps only in older versions?),
exec() may not detach from the shared memory segment after all, in which
case these child processes did remain attached to shared memory, posing
the risk of an unexpected shared memory clobber if they went off the rails
somehow. That may be a long-gone bug, but we can deal with it now if it's
still live, by detaching within the infrastructure introduced here to deal
with closing the handle.
Back-patch to all supported branches.
Tom Lane and Amit Kapila
On recent AIX it's necessary to configure gcc to use the native assembler
(because the GNU assembler hasn't been updated to handle AIX 6+). This
caused PG builds to fail with assembler syntax errors, because we'd try
to compile s_lock.h's gcc asm fragment for PPC, and that assembly code
relied on GNU-style local labels. We can't substitute normal labels
because it would fail in any file containing more than one inlined use of
tas(). Fortunately, that code is stable enough, and the PPC ISA is simple
enough, that it doesn't seem like too much of a maintenance burden to just
hand-code the branch offsets, removing the need for any labels.
Note that the AIX assembler only accepts "$" for the location counter
pseudo-symbol. The usual GNU convention is "."; but it appears that all
versions of gas for PPC also accept "$", so in theory this patch will not
break any other PPC platforms.
This has been reported by a few people, but Steve Underwood gets the credit
for being the first to pursue the problem far enough to understand why it
was failing. Thanks also to Noah Misch for additional testing.
Back-patch 9.3-era commit eeb6f37d89, to
improve the older branches' ability to cope with pg_dump dumping a large
number of tables.
I back-patched into 9.2 and 9.1, but not 9.0 as it would have required a
significant amount of refactoring, thus negating the argument that this
is by-now-well-tested code.
Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.
Commit 2ce439f337 introduced a rather serious
regression, namely that if its scan of the data directory came across any
un-fsync-able files, it would fail and thereby prevent database startup.
Worse yet, symlinks to such files also caused the problem, which meant that
crash restart was guaranteed to fail on certain common installations such
as older Debian.
After discussion, we agreed that (1) failure to start is worse than any
consequence of not fsync'ing is likely to be, therefore treat all errors
in this code as nonfatal; (2) we should not chase symlinks other than
those that are expected to exist, namely pg_xlog/ and tablespace links
under pg_tblspc/. The latter restriction avoids possibly fsync'ing a
much larger part of the filesystem than intended, if the user has left
random symlinks hanging about in the data directory.
This commit takes care of that and also does some code beautification,
mainly moving the relevant code into fd.c, which seems a much better place
for it than xlog.c, and making sure that the conditional compilation for
the pre_sync_fname pass has something to do with whether pg_flush_data
works.
I also relocated the call site in xlog.c down a few lines; it seems a
bit silly to be doing this before ValidateXLOGDirectoryStructure().
The similar logic in initdb.c ought to be made to match this, but that
change is noncritical and will be dealt with separately.
Back-patch to all active branches, like the prior commit.
Abhijit Menon-Sen and Tom Lane
Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.
Back-patch to all supported versions.
Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.
There was a window in RestoreBackupBlock where a page would be zeroed out,
but not yet locked. If a backend pinned and locked the page in that window,
it saw the zeroed page instead of the old page or new page contents, which
could lead to missing rows in a result set, or errors.
To fix, replace RBM_ZERO with RBM_ZERO_AND_LOCK, which atomically pins,
zeroes, and locks the page, if it's not in the buffer cache already.
In stable branches, the old RBM_ZERO constant is renamed to RBM_DO_NOT_USE,
to avoid breaking any 3rd party extensions that might use RBM_ZERO. More
importantly, this avoids renumbering the other enum values, which would
cause even bigger confusion in extensions that use ReadBufferExtended, but
haven't been recompiled.
Backpatch to all supported versions; this has been racy since hot standby
was introduced.
x86's memory barrier assembly was marked as clobbering "memory" but
not "cc" even though 'addl' sets various flags. As it turns out gcc on
x86 implicitly assumes "cc" on every inline assembler statement, so
it's not a bug. But as that's poorly documented and might get copied
to architectures or compilers where that's not the case, it seems
better to be precise.
Discussion: 20140919100016.GH4277@alap3.anarazel.de
To keep the code common, backpatch to 9.2 where explicit memory
barriers were introduced.
Some Sparc CPUs can be run in various coherence models, ranging from
RMO (relaxed) over PSO (partial) to TSO (total). Solaris has always
run CPUs in TSO mode while in userland, but linux didn't use to and
the various *BSDs still don't. Unfortunately the sparc TAS/S_UNLOCK
were only correct under TSO. Fix that by adding the necessary memory
barrier instructions. On sparcv8+, which should be all relevant CPUs,
these are treated as NOPs if the current consistency model doesn't
require the barriers.
Discussion: 20140630222854.GW26930@awork2.anarazel.de
Will be backpatched to all released branches once a few buildfarm
cycles haven't shown up problems. As I've no access to sparc, this is
blindly written.
This was not changed in HEAD, but will be done later as part of a
pgindent run. Future pgindent runs will also do this.
Report by Tom Lane
Backpatch through all supported branches, but not HEAD
A fake relcache entry can "own" a SmgrRelation object, like a regular
relcache entry. But when it was free'd, the owner field in SmgrRelation
was not cleared, so it was left pointing to free'd memory.
Amazingly this apparently hasn't caused crashes in practice, or we would've
heard about it earlier. Andres found this with Valgrind.
Report and fix by Andres Freund, with minor modifications by me. Backpatch
to all supported versions.
In ordinary operation, VACUUM must be careful to take a cleanup lock on
each leaf page of a btree index; this ensures that no indexscans could
still be "in flight" to heap tuples due to be deleted. (Because of
possible index-tuple motion due to concurrent page splits, it's not enough
to lock only the pages we're deleting index tuples from.) In Hot Standby,
the WAL replay process must likewise lock every leaf page. There were
several bugs in the code for that:
* The replay scan might come across unused, all-zero pages in the index.
While btree_xlog_vacuum itself did the right thing (ie, nothing) with
such pages, xlogutils.c supposed that such pages must be corrupt and
would throw an error. This accounts for various reports of replication
failures with "PANIC: WAL contains references to invalid pages". To
fix, add a ReadBufferMode value that instructs XLogReadBufferExtended
not to complain when we're doing this.
* btree_xlog_vacuum performed the extra locking if standbyState ==
STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up
for hot standby queries until the database has reached consistency, and
we don't want to do the extra locking till then either, for fear of reading
corrupted pages (which bufmgr.c would complain about). Fix by exporting a
new function from xlog.c that will report whether we're actually in hot
standby replay mode.
* To ensure full coverage of the index in the replay scan, btvacuumscan
would emit a dummy WAL record for the last page of the index, if no
vacuuming work had been done on that page. However, if the last page
of the index is all-zero, that would result in corruption of said page,
since the functions called on it weren't prepared to handle that case.
There's no need to lock any such pages, so change the logic to target
the last normal leaf page instead.
The first two of these bugs were diagnosed by Andres Freund, the other one
by me. Fixes based on ideas from Heikki Linnakangas and myself.
This has been wrong since Hot Standby was introduced, so back-patch to 9.0.
When acquiring a lock in fast-path mode, we must reset the locallock
object's lock and proclock fields to NULL. They are not necessarily that
way to start with, because the locallock could be left over from a failed
lock acquisition attempt earlier in the transaction. Failure to do this
led to all sorts of interesting misbehaviors when LockRelease tried to
clean up no-longer-related lock and proclock objects in shared memory.
Per report from Dan Wood.
In passing, modify LockRelease to elog not just Assert if it doesn't find
lock and proclock objects for a formerly fast-path lock, matching the code
in FastPathGetRelationLockEntry and LockRefindAndRelease. This isn't a
bug but it will help in diagnosing any future bugs in this area.
Also, modify FastPathTransferRelationLocks and FastPathGetRelationLockEntry
to break out of their loops over the fastpath array once they've found the
sole matching entry. This was inconsistently done in some search loops
and not others.
Improve assorted related comments, too.
Back-patch to 9.2 where the fast-path mechanism was introduced.
If a tuple was frozen while its predicate locks mattered,
read-write dependencies could be missed, resulting in failure to
detect conflicts which could lead to anomalies in committed
serializable transactions.
This field was added to the tag when we still thought that it was
necessary to carry locks forward to a new version of an updated
row. That was later proven to be unnecessary, which allowed
simplification of the code, but elimination of xmin from the tag
was missed at the time.
Per report and analysis by Heikki Linnakangas.
Backpatch to 9.1.
In some cases with higher numbers of subtransactions
it was possible for us to incorrectly initialize
subtrans leading to complaints of missing pages.
Bug report by Sergey Konoplev
Analysis and fix by Andres Freund
Use the same gcc atomic functions as we do on newer ARM chips.
(Basically this is a copy and paste of the __arm__ code block,
but omitting the SWPB option since that definitely won't work.)
Back-patch to 9.2. The patch would work further back, but we'd also
need to update config.guess/config.sub in older branches to make them
build out-of-the-box, and there hasn't been demand for it.
Mark Salter
In situations where there are over 8MB of empty pages at the end of
a table, the truncation work for trailing empty pages takes longer
than deadlock_timeout, and there is frequent access to the table by
processes other than autovacuum, there was a problem with the
autovacuum worker process being canceled by the deadlock checking
code. The truncation work done by autovacuum up that point was
lost, and the attempt tried again by a later autovacuum worker. The
attempts could continue indefinitely without making progress,
consuming resources and blocking other processes for up to
deadlock_timeout each time.
This patch has the autovacuum worker checking whether it is
blocking any other thread at 20ms intervals. If such a condition
develops, the autovacuum worker will persist the work it has done
so far, release its lock on the table, and sleep in 50ms intervals
for up to 5 seconds, hoping to be able to re-acquire the lock and
try again. If it is unable to get the lock in that time, it moves
on and a worker will try to continue later from the point this one
left off.
While this patch doesn't change the rules about when and what to
truncate, it does cause the truncation to occur sooner, with less
blocking, and with the consumption of fewer resources when there is
contention for the table's lock.
The only user-visible change other than improved performance is
that the table size during truncation may change incrementally
instead of just once.
Backpatched to 9.0 from initial master commit at
b19e4250b4 -- before that the
differences are too large to be clearly safe.
Jan Wieck
This reverts commit c11130690d in favor of
actually fixing the problem: namely, that we should never have been
modifying the checkpoint record's nextXid at this point to begin with.
The nextXid should match the state as of the checkpoint's logical WAL
position (ie the redo point), not the state as of its physical position.
It's especially bogus to advance it in some wal_levels and not others.
In any case there is no need for the checkpoint record to carry the
same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by
LogStandbySnapshot, as any replay operation will already have adopted
that value as current.
This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug
#6291 from Daniel Farina, in that if a checkpoint were in progress at the
instant of XID wraparound, the epoch bump would be lost as reported.
(And, of course, these days there's at least a 50-50 chance of a checkpoint
being in progress at any given instant.)
Diagnosed by me and independently by Andres Freund. Back-patch to all
branches supporting hot standby.
If the sleep is interrupted by a signal, we must recompute the remaining
time to wait; otherwise, a steady stream of non-wait-terminating interrupts
could delay return from WaitLatch indefinitely. This has been shown to be
a problem for the autovacuum launcher, and there may well be other places
now or in the future with similar issues. So we'd better make the function
robust, even though this'll add at least one gettimeofday call per wait.
Back-patch to 9.2. We might eventually need to fix 9.1 as well, but the
code is quite different there, and the usage of WaitLatch in 9.1 is so
limited that it's not clearly important to do so.
Reported and diagnosed by Jeff Janes, though I rewrote his patch rather
heavily.
If an SMgrRelation is not "owned" by a relcache entry, don't allow it to
live past transaction end. This design allows the same SMgrRelation to be
used for blind writes of multiple blocks during a transaction, but ensures
that we don't hold onto such an SMgrRelation indefinitely. Because an
SMgrRelation typically corresponds to open file descriptors at the fd.c
level, leaving it open when there's no corresponding relcache entry can
mean that we prevent the kernel from reclaiming deleted disk space.
(While CacheInvalidateSmgr messages usually fix that, there are cases
where they're not issued, such as DROP DATABASE. We might want to add
some more sinval messaging for that, but I'd be inclined to keep this
type of logic anyway, since allowing VFDs to accumulate indefinitely
for blind-written relations doesn't seem like a good idea.)
This code replaces a previous attempt towards the same goal that proved
to be unreliable. Back-patch to 9.1 where the previous patch was added.
This reverts commit fba105b109.
That approach had problems with the smgr-level state not tracking what
we really want to happen, and with the VFD-level state not tracking the
smgr-level state very well either. In consequence, it was still possible
to hold kernel file descriptors open for long-gone tables (as in recent
report from Tore Halset), and yet there were also cases of FDs being closed
undesirably soon. A replacement implementation will follow.
In the previous coding, new backend processes would attempt to create their
self-pipe during the OwnLatch call in InitProcess. However, pipe creation
could fail if the kernel is short of resources; and the system does not
recover gracefully from a FATAL error right there, since we have armed the
dead-man switch for this process and not yet set up the on_shmem_exit
callback that would disarm it. The postmaster then forces an unnecessary
database-wide crash and restart, as reported by Sean Chittenden.
There are various ways we could rearrange the code to fix this, but the
simplest and sanest seems to be to split out creation of the self-pipe into
a new function InitializeLatchSupport, which must be called from a place
where failure is allowed. For most processes that gets called in
InitProcess or InitAuxiliaryProcess, but processes that don't call either
but still use latches need their own calls.
Back-patch to 9.1, which has only a part of the latch logic that 9.2 and
HEAD have, but nonetheless includes this bug.
In all branches back to 8.3, this patch fixes a questionable assumption in
CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are
no uninitialized pad bytes in the request queue structs. This would only
cause trouble if (a) there were such pad bytes, which could happen in 8.4
and up if the compiler makes enum ForkNumber narrower than 32 bits, but
otherwise would require not-currently-planned changes in the widths of
other typedefs; and (b) the kernel has not uniformly initialized the
contents of shared memory to zeroes. Still, it seems a tad risky, and we
can easily remove any risk by pre-zeroing the request array for ourselves.
In addition to that, we need to establish a coding rule that struct
RelFileNode can't contain any padding bytes, since such structs are copied
into the request array verbatim. (There are other places that are assuming
this anyway, it turns out.)
In 9.1 and up, the risk was a bit larger because we were also effectively
assuming that struct RelFileNodeBackend contained no pad bytes, and with
fields of different types in there, that would be much easier to break.
However, there is no good reason to ever transmit fsync or delete requests
for temp files to the bgwriter/checkpointer, so we can revert the request
structs to plain RelFileNode, getting rid of the padding risk and saving
some marginal number of bytes and cycles in fsync queue manipulation while
we are at it. The savings might be more than marginal during deletion of
a temp relation, because the old code transmitted an entirely useless but
nonetheless expensive-to-process ForgetRelationFsync request to the
background process, and also had the background process perform the file
deletion even though that can safely be done immediately.
In addition, make some cleanup of nearby comments and small improvements to
the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
This provides a speedup of about 4X when NBuffers is large enough.
There is also a useful reduction in sinval traffic, since we
only do CacheInvalidateSmgr() once not once per fork.
Simon Riggs, reviewed and somewhat revised by Tom Lane
When the "hot" members of PGPROC were split off to separate PGXACT structs,
many PGPROC fields referred to in comments were moved to PGXACT, but the
comments were neglected in the commit. Mostly this is just a search/replace
of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for
prepared transactions changed more, making some of the comments totally
bogus.
Noah Misch
Commit 6d90eaaa89 added a hibernation mode
to the bgwriter to reduce the server's idle-power consumption. However,
its interaction with the detailed behavior of BgBufferSync's feedback
control loop wasn't very well thought out. That control loop depends
primarily on the rate of buffer allocation, not the rate of buffer
dirtying, so the hibernation mode has to be designed to operate only when
no new buffer allocations are happening. Also, the check for whether the
system is effectively idle was not quite right and would fail to detect
a constant low level of activity, thus allowing the bgwriter to go into
hibernation mode in a way that would let the cycle time vary quite a bit,
possibly further confusing the feedback loop. To fix, move the wakeup
support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into
StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode
unless no buffer allocations have happened recently.
In addition, fix the delaying logic to remove the problem of possibly not
responding to signals promptly, which was basically caused by trying to use
the process latch's is_set flag for multiple purposes. I can't prove it
but I'm suspicious that that hack was responsible for the intermittent
"postmaster does not shut down" failures we've been seeing in the buildfarm
lately. In any case it did nothing to improve the readability or
robustness of the code.
In passing, express the hibernation sleep time as a multiplier on
BgWriterDelay, not a constant. I'm not sure whether there's any value in
exposing the longer sleep time as an independently configurable setting,
but we can at least make it act like this for little extra code.
This patch modifies the walwriter process so that, when it has not found
anything useful to do for many consecutive wakeup cycles, it extends its
sleep time to reduce the server's idle power consumption. It reverts to
normal as soon as it's done any successful flushes. It's still true that
during any async commit, backends check for completed, unflushed pages of
WAL and signal the walwriter if there are any; so that in practice the
walwriter can get awakened and returned to normal operation sooner than the
sleep time might suggest.
Also, improve the checkpointer so that it uses a latch and a computed delay
time to not wake up at all except when it has something to do, replacing a
previous hardcoded 0.5 sec wakeup cycle. This also is primarily useful for
reducing the server's power consumption when idle.
In passing, get rid of the dedicated latch for signaling the walwriter in
favor of using its procLatch, since that comports better with possible
generic signal handlers using that latch. Also, fix a pre-existing bug
with failure to save/restore errno in walwriter's signal handlers.
Peter Geoghegan, somewhat simplified by Tom
Commit 62c7bd31c8 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees. More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally. This fix therefore
removes that flag altogether. We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned. Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.
PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object. Perhaps it will be
worth improving that someday.
Assorted other minor cleanup and documentation editing, as well.
Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
Remove the following ports:
- dgux
- nextstep
- sunos4
- svr4
- ultrix4
- univel
These are obsolete and not worth rescuing. In most cases, there is
circumstantial evidence that they wouldn't work anymore anyway.
The previous code could cause a backend crash after BEGIN; SAVEPOINT a;
LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO
SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts
in other situations.
Report by Zoltán Böszörményi; patch review by Jeff Davis.
Postmaster sets max_safe_fds by testing how many open file descriptors it
can open, and that is normally inherited by all child processes at fork().
Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we
effectively ignored max_files_per_process on Windows, and always assumed
a conservative default of 32 simultaneous open files. That could have an
impact on performance, if you need to access a lot of different files
in a query. After this patch, the value is passed to child processes by
save/restore_backend_variables() among many other global variables.
It has been like this forever, but given the lack of complaints about it,
I'm not backpatching this.
Currently, the only way to see the numbers this gathers is via
EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through
the stats collector and pg_stat_statements in subsequent patches.
Ants Aasma, reviewed by Greg Smith, with some further changes by me.