The syslogger will write out the current stderr and csvlog names, if
it's running and there are any, to a new file in the data directory
called "current_logfiles". We take care to remove this file when it
might no longer be valid (but not at shutdown). The function
pg_current_logfile() can be used to read the entries in the file.
Gilles Darold, reviewed and modified by Karl O. Pinc, Michael
Paquier, and me. Further review by Álvaro Herrera and Christoph Berg.
Currently, the whole row is shown without column names. Instead,
adopt a style similar to _bt_check_unique() in ExecFindPartition()
and show the failing key: (key1, ...) = (val1, ...).
Amit Langote, per a complaint from Simon Riggs. Reviewed by me;
I also adjusted the grammar in one of the comments.
Discussion: http://postgr.es/m/9f9dc7ae-14f0-4a25-5485-964d9bfc19bd@lab.ntt.co.jp
Likewise in RestoreSnapshot(). Do so by copying between the user buffer
and a stack buffer of known alignment. Back-patch to 9.6, where this
last applies cleanly. In master, the select_parallel test dies with
SIGBUS on "Oracle Solaris 10 1/13 s10s_u11wos_24a SPARC", building
32-bit with gcc 4.9.2. In 9.6 and 9.5, the buffers in question happen
to be sufficiently-aligned, and this change is mere insurance against
future 9.6 changes or extension code compromising that.
In the previous commit I'd made MemoryContextContains() use
GetMemoryChunkContext(), but that causes trouble when the passed
pointer isn't allocated in any memory context - that's probably
something we shouldn't do, but the previous commit isn't a place for a
"policy" change.
The README was written as a "historical account", and that style
hasn't aged particularly well. Rephrase it to describe the current
situation, instead of having various version specific comments.
This also updates the description of how allocated chunks are
associated with their corresponding context, the method of which has
changed in the preceding commit.
Author: Andres Freund
Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
The new slab allocator needs different per-allocation information than
the classical aset.c. The definition in 58b25e981 wasn't sufficiently
careful on 32 platforms with 8 byte alignment, leading to buildfarm
failures. That's not entirely easy to fix by just adjusting the
definition.
As slab.c doesn't actually need the size part(s) of the common header,
all chunks are equally sized after all, it seems better to instead
reduce the header to the part needed by all allocators, namely which
context an allocation belongs to. That has the advantage of reducing
the overhead of slab allocations, and also allows for more flexibility
in future allocators.
To avoid spreading the logic about accessing a chunk's context around,
centralize it in GetMemoryChunkContext(), which allows to delete a
good number of lines.
A followup commit will revise the mmgr/README portion about
StandardChunkHeader, and more.
Author: Andres Freund
Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
PQerrorMessage() returns an error message with a trailing newline, but
in backend use (dblink, postgres_fdw, libpqwalreceiver), we want to have
the error message without that for emitting via ereport(). To simplify
that, add a function pchomp() that returns a pstrdup'ed string with the
trailing newline characters removed.
The default general purpose aset.c style memory context is not a great
choice for allocations that are all going to be evenly sized,
especially when those objects aren't small, and have varying
lifetimes. There tends to be a lot of fragmentation, larger
allocations always directly go to libc rather than have their cost
amortized over several pallocs.
These problems lead to the introduction of ad-hoc slab allocators in
reorderbuffer.c. But it turns out that the simplistic implementation
leads to problems when a lot of objects are allocated and freed, as
aset.c is still the underlying implementation. Especially freeing can
easily run into O(n^2) behavior in aset.c.
While the O(n^2) behavior in aset.c can, and probably will, be
addressed, custom allocators for this behavior are more efficient
both in space and time.
This allocator is for evenly sized allocations, and supports both
cheap allocations and freeing, without fragmenting significantly. It
does so by allocating evenly sized blocks via malloc(), and carves
them into chunks that can be used for allocations. In order to
release blocks to the OS as early as possible, chunks are allocated
from the fullest block that still has free objects, increasing the
likelihood of a block being entirely unused.
A subsequent commit uses this in reorderbuffer.c, but a further
allocator is needed to resolve the performance problems triggering
this work.
There likely are further potentialy uses of this allocator besides
reorderbuffer.c.
There's potential further optimizations of the new slab.c, in
particular the array of freelists could be replaced by a more
intelligent structure - but for now this looks more than good enough.
Author: Tomas Vondra, editorialized by Andres Freund
Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
An upcoming patch introduces a new type of memory context. To avoid
duplicating debugging infrastructure within aset.c, move useful pieces
to memdebug.[ch].
While touching aset.c, fix printf format code in AllocFree* debug
macros.
Author: Tomas Vondra
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/b3b2245c-b37a-e1e5-ebc4-857c914bc747@2ndquadrant.com
c.h #includes a number of core libc header files, such as <stdio.h>.
There's no point in re-including these after having read postgres.h,
postgres_fe.h, or c.h; so remove code that did so.
While at it, also fix some places that were ignoring our standard pattern
of "include postgres[_fe].h, then system header files, then other Postgres
header files". While there's not any great magic in doing it that way
rather than system headers last, it's silly to have just a few files
deviating from the general pattern. (But I didn't attempt to enforce this
globally, only in files I was touching anyway.)
I'd be the first to say that this is mostly compulsive neatnik-ism,
but over time it might save enough compile cycles to be useful.
If someone were to try to call one of the enum comparison functions
using DirectFunctionCallN, it would very likely seem to work, because
only in unusual cases does enum_cmp_internal() need to access the
typcache. But once such a case occurred, code like that would crash
with a null pointer dereference. To make an oversight of that sort
less likely to escape detection, add a non-bypassable Assert that
fcinfo->flinfo isn't NULL.
Discussion: https://postgr.es/m/25226.1487900067@sss.pgh.pa.us
Twiddle the replication-related code so that its timestamp variables
are declared TimestampTz, rather than the uninformative "int64" that
was previously used for meant-to-be-always-integer timestamps.
This resolves the int64-vs-TimestampTz declaration inconsistencies
introduced by commit 7c030783a, though in the opposite direction to
what was originally suggested.
This required including datatype/timestamp.h in a couple more places
than before. I decided it would be a good idea to slim down that
header by not having it pull in <float.h> etc, as those headers are
no longer at all relevant to its purpose. Unsurprisingly, a small number
of .c files turn out to have been depending on those inclusions, so add
them back in the .c files as needed.
Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us
We don't need it any more.
pg_controldata continues to report that date/time type storage is
"64-bit integers", but that's now a hard-wired behavior not something
it sees in the data. This avoids breaking pg_upgrade, and perhaps other
utilities that inspect pg_control this way. Ditto for pg_resetwal.
I chose to remove the "bigint_timestamps" output column of
pg_control_init(), though, as that function hasn't been around long
and probably doesn't have ossified users.
Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
Columns with array pseudotypes have not been identified as arrays, so
they have been rendered as strings in the json and jsonb conversion
routines. This change allows them to be rendered as json arrays, making
it possible to deal correctly with the anyarray columns in pg_stats.
Be specific about which pattern is being complained of, and avoid saying
"it's not supported in to_date", which is just confusing if the error is
actually coming out of to_timestamp. We can phrase it as "is only
supported in to_char", instead. Also, use the term "formatting field" not
"format pattern", because other error messages in the same file prefer that
terminology. (This isn't terribly consistent with the documentation, so
maybe we should change all these error messages?)
A new function dsa_allocate_extended now takes flags which indicate
that huge allocations should be permitted, that out-of-memory
conditions should not throw an error, and/or that the returned memory
should be zero-filled, just like MemoryContextAllocateExtended.
Commit 9acb85597f, which added
dsa_allocate0, was broken because it failed to account for the
possibility that dsa_allocate() might return InvalidDsaPointer.
This fixes that problem along the way.
Thomas Munro, with some comment changes by me.
Discussion: http://postgr.es/m/CA+Tgmobt7CcF_uQP2UQwWmu4K9qCHehMJP9_9m1urwP8hbOeHQ@mail.gmail.com
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes. This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.
Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
The S/390 members of the buildfarm are showing failures indicating
that they're having trouble with the rint() calls I added yesterday.
There's no good reason for that, and I wonder if it is a compiler bug
similar to the one we worked around in d9476b838. Try to fix it using
the same method as before, namely to store the result of rint() back
into a "double" variable rather than immediately converting to int64.
(This isn't entirely waving a dead chicken, since on machines with
wider-than-double float registers, the extra store forces a width
conversion. I don't know if S/390 is like that, but it seems worth
trying.)
In passing, merge duplicate ereport() calls in float8_timestamptz().
Per buildfarm.
When converting a float value to integer microseconds, we should be careful
to round the value to the nearest integer, typically with rint(); simply
assigning to an int64 variable will truncate, causing apparently off-by-one
values in cases that should work. Most places in the datetime code got
this right, but not these two.
float8_timestamptz() is new as of commit e511d878f (9.6). Previous
versions effectively depended on interval_mul() to do roundoff correctly,
which it does, so this fixes an accuracy regression in 9.6.
The problem in make_interval() dates to its introduction in 9.4. Aside
from being careful to round not truncate, let's incorporate the hours and
minutes inputs into the result with exact integer arithmetic, rather than
risk introducing roundoff error where there need not have been any.
float8_timestamptz() problem reported by Erik Nordström, though this is
not his proposed patch. make_interval() problem found by me.
Discussion: https://postgr.es/m/CAHuQZDS76jTYk3LydPbKpNfw9KbACmD=49dC4BrzHcfPv6yA1A@mail.gmail.com
When the new GUC wal_consistency_checking is set to a non-empty value,
it triggers recording of additional full-page images, which are
compared on the standby against the results of applying the WAL record
(without regard to those full-page images). Allowable differences
such as hints are masked out, and the resulting pages are compared;
any difference results in a FATAL error on the standby.
Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki
Linnakangas. Extensively reviewed and revised by Michael Paquier and
by me, with additional reviews and comments from Amit Kapila, Álvaro
Herrera, Simon Riggs, and Peter Eisentraut.
The problem with the original coding here is that we might receive (and
clear) a relcache invalidation signal for the target relation down inside
one of the index_open calls we're doing. Since the target is open, we
would not drop the relcache entry, just reset its rd_indexvalid and
rd_indexlist fields. But RelationGetIndexAttrBitmap() kept going, and
would eventually cache and return potentially-obsolete attribute bitmaps.
The case where this matters is where the inval signal was from a CREATE
INDEX CONCURRENTLY telling us about a new index on a formerly-unindexed
column. (In all other cases, the lock we hold on the target rel should
prevent any concurrent change in index state.) Even just returning the
stale attribute bitmap is not such a problem, because it shouldn't matter
during the transaction in which we receive the signal. What hurts is
caching the stale data, because it can survive into later transactions,
breaking CREATE INDEX CONCURRENTLY's expectation that later transactions
will not create new broken HOT chains. The upshot is that there's a window
for building corrupted indexes during CREATE INDEX CONCURRENTLY.
This patch fixes the problem by rechecking that the set of index OIDs
is still the same at the end of RelationGetIndexAttrBitmap() as it was
at the start. If not, we loop back and try again. That's a little
more than is strictly necessary to fix the bug --- in principle, we
could return the stale data but not cache it --- but it seems like a
bad idea on general principles for relcache to return data it knows
is stale.
There might be more hazards of the same ilk, or there might be a better
way to fix this one, but this patch definitely improves matters and seems
unlikely to make anything worse. So let's push it into today's releases
even as we continue to study the problem.
Pavan Deolasee and myself
Discussion: https://postgr.es/m/CABOikdM2MUq9cyZJi1KyLmmkCereyGp5JQ4fuwKoyKEde_mzkQ@mail.gmail.com
Commit 665d1fad9 introduced rd_pkindex, and made RelationGetIndexList
responsible for updating it, but didn't bother to fix
RelationGetIndexList's header comment to say so.
There is no particularly good reason to limit this value to 1000,
so increase the limit to INT_MAX / 2, the same limit we use for
shared_buffers. It's not clear how much practical effect larger
settings will have, but there seems no harm in letting people try it.
Jim Nasby, less a comment change I stripped out.
Discussion: http://postgr.es/m/f6e58a22-030b-eb8a-5457-f62fb08d701c@BlueTreble.com
Doing so doesn't seem to be within the purpose of the per user
connection limits, and has particularly unfortunate effects in
conjunction with parallel queries.
Backpatch to 9.6 where parallel queries were introduced.
David Rowley, reviewed by Robert Haas and Albe Laurenz.
The "Simplify tape block format" commit ignored the rule that blocks
returned by ltsGetFreeBlock() must be written out in the same order, at
least in the first write pass. To fix, relax that requirement, by making
ltsWriteBlock() to detect if it's about to create a "hole" in the
underlying BufFile, and fill it with zeros instead.
Reported, analysed, and reviewed by Peter Geoghegan.
Discussion: https://www.postgresql.org/message-id/CAM3SWZRWdNtkhiG0GyiX_1mUAypiK3dV6-6542pYe2iEL-foTA@mail.gmail.com
Split the existing CatalogUpdateIndexes into two different routines,
CatalogTupleInsert and CatalogTupleUpdate, which do both the heap
insert/update plus the index update. This removes over 300 lines of
boilerplate code all over src/backend/catalog/ and src/backend/commands.
The resulting code is much more pleasing to the eye.
Also, by encapsulating what happens in detail during an UPDATE, this
facilitates the upcoming WARM patch, which is going to add a few more
lines to the update case making the boilerplate even more boring.
The original CatalogUpdateIndexes is removed; there was only one use
left, and since it's just three lines, we can as well expand it in place
there. We could keep it, but WARM is going to break all the UPDATE
out-of-core callsites anyway, so there seems to be no benefit in doing
so.
Author: Pavan Deolasee
Discussion: https://www.postgr.es/m/CABOikdOcFYSZ4vA2gYfs=M2cdXzXX4qGHeEiW3fu9PCfkHLa2A@mail.gmail.com
When I wrote commit ab1f0c822, I really missed the castNode() macro that
Peter E. had proposed shortly before. This back-fills the uses I would
have put it to. It's probably not all that significant, but there are
more assertions here than there were before, and conceivably they will
help catch any bugs associated with those representation changes.
I left behind a number of usages like "(Query *) copyObject(query_var)".
Those could have been converted as well, but Peter has proposed another
notational improvement that would handle copyObject cases automatically,
so I let that be for now.
!foo means "the tsvector does not contain foo", and therefore it should
match an empty tsvector. ts_match_vq() overenthusiastically supposed
that an empty tsvector could never match any query, so it forcibly
returned FALSE, the wrong answer. Remove the premature optimization.
Our behavior on this point was inconsistent, because while seqscans and
GIST index searches both failed to match empty tsvectors, GIN index
searches would find them, since GIN scans don't rely on ts_match_vq().
That makes this certainly a bug, not a debatable definition disagreement,
so back-patch to all supported branches.
Report and diagnosis by Tom Dunstan (bug #14515); added test cases by me.
Discussion: https://postgr.es/m/20170126025524.1434.97828@wrigleys.postgresql.org
This improves readability a bit and may make future improvements easier.
In passing, make sure that the JB_ROOT_IS_XXX macros deliver boolean (0/1)
results; the previous coding was a bug hazard, though no actual bugs are
known.
Nikita Glukhov, extended a bit by me
Discussion: https://postgr.es/m/9e21a39c-c1d7-b9b5-44a0-c5345a5029f6@postgrespro.ru
This is useful infrastructure for an upcoming proposed patch to
allow the WAL segment size to be changed at initdb time; tools like
pg_basebackup need the ability to interrogate the server setting.
But it also doesn't seem like a bad thing to have independently of
that; it may find other uses in the future.
Robert Haas and Beena Emerson. (The original patch here was by
Beena, but I rewrote it to such a degree that most of the code
being committed here is mine.)
Discussion: http://postgr.es/m/CA+TgmobNo4qz06wHEmy9DszAre3dYx-WNhHSCbU9SAwf+9Ft6g@mail.gmail.com
We've accumulated quite a bit of stuff with which pgindent is not
quite happy in this code; clean it up to provide a less-annoying base
for future pgindent runs.
The code here previously tried to call the partitioning operator, but
really the right thing to do (and the safe thing to do) is use
datumIsEqual().
Amit Langote, but I expanded the comment and fixed a compiler warning.
Project style is to put things in this order, for the good and sufficient
reason that you often need the typedefs in the function declarations.
There already was one function declaration that needed a typedef, which
was randomly placed away from all the other static function declarations
in consequence. And the submitted patch for better json_populate_record
functionality jumped through even more hoops in order to preserve this
bad idea.
This patch only moves lines from point A to point B, no other changes.