Commit graph

9163 commits

Author SHA1 Message Date
Peter Geoghegan
0d861bbb70 Add deduplication to nbtree.
Deduplication reduces the storage overhead of duplicates in indexes that
use the standard nbtree index access method.  The deduplication process
is applied lazily, after the point where opportunistic deletion of
LP_DEAD-marked index tuples occurs.  Deduplication is only applied at
the point where a leaf page split would otherwise be required.  New
posting list tuples are formed by merging together existing duplicate
tuples.  The physical representation of the items on an nbtree leaf page
is made more space efficient by deduplication, but the logical contents
of the page are not changed.  Even unique indexes make use of
deduplication as a way of controlling bloat from duplicates whose TIDs
point to different versions of the same logical table row.

The lazy approach taken by nbtree has significant advantages over a GIN
style eager approach.  Most individual inserts of index tuples have
exactly the same overhead as before.  The extra overhead of
deduplication is amortized across insertions, just like the overhead of
page splits.  The key space of indexes works in the same way as it has
since commit dd299df8 (the commit that made heap TID a tiebreaker
column).

Testing has shown that nbtree deduplication can generally make indexes
with about 10 or 15 tuples for each distinct key value about 2.5X - 4X
smaller, even with single column integer indexes (e.g., an index on a
referencing column that accompanies a foreign key).  The final size of
single column nbtree indexes comes close to the final size of a similar
contrib/btree_gin index, at least in cases where GIN's posting list
compression isn't very effective.  This can significantly improve
transaction throughput, and significantly reduce the cost of vacuuming
indexes.

A new index storage parameter (deduplicate_items) controls the use of
deduplication.  The default setting is 'on', so all new B-Tree indexes
automatically use deduplication where possible.  This decision will be
reviewed at the end of the Postgres 13 beta period.

There is a regression of approximately 2% of transaction throughput with
synthetic workloads that consist of append-only inserts into a table
with several non-unique indexes, where all indexes have few or no
repeated values.  The underlying issue is that cycles are wasted on
unsuccessful attempts at deduplicating items in non-unique indexes.
There doesn't seem to be a way around it short of disabling
deduplication entirely.  Note that deduplication of items in unique
indexes is fairly well targeted in general, which avoids the problem
there (we can use a special heuristic to trigger deduplication passes in
unique indexes, since we're specifically targeting "version bloat").

Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed.

No bump in BTREE_VERSION, since the representation of posting list
tuples works in a way that's backwards compatible with version 4 indexes
(i.e. indexes built on PostgreSQL 12).  However, users must still
REINDEX a pg_upgrade'd index to use deduplication, regardless of the
Postgres version they've upgraded from.  This is the only way to set the
new nbtree metapage flag indicating that deduplication is generally
safe.

Author: Anastasia Lubennikova, Peter Geoghegan
Reviewed-By: Peter Geoghegan, Heikki Linnakangas
Discussion:
    https://postgr.es/m/55E4051B.7020209@postgrespro.ru
    https://postgr.es/m/4ab6e2db-bcee-f4cf-0916-3a06e6ccbb55@postgrespro.ru
2020-02-26 13:05:30 -08:00
Peter Geoghegan
612a1ab767 Add equalimage B-Tree support functions.
Invent the concept of a B-Tree equalimage ("equality implies image
equality") support function, registered as support function 4.  This
indicates whether it is safe (or not safe) to apply optimizations that
assume that any two datums considered equal by an operator class's order
method must be interchangeable without any loss of semantic information.
This is static information about an operator class and a collation.

Register an equalimage routine for almost all of the existing B-Tree
opclasses.  We only need two trivial routines for all of the opclasses
that are included with the core distribution.  There is one routine for
opclasses that index non-collatable types (which returns 'true'
unconditionally), plus another routine for collatable types (which
returns 'true' when the collation is a deterministic collation).

This patch is infrastructure for an upcoming patch that adds B-Tree
deduplication.

Author: Peter Geoghegan, Anastasia Lubennikova
Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
2020-02-26 11:28:25 -08:00
Andres Freund
2742c45080 expression eval: Reduce number of steps for agg transition invocations.
Do so by combining the various steps that are part of aggregate
transition function invocation into one larger step. As some of the
current steps are only necessary for some aggregates, have one variant
of the aggregate transition step for each possible combination.

To avoid further manual copies of code in the different transition
step implementations, move most of the code into helper functions
marked as "always inline".

The benefit of this change is an increase in performance when
aggregating lots of rows. This comes in part due to the reduced number
of indirect jumps due to the reduced number of steps, and in part by
reducing redundant setup code across steps. This mainly benefits
interpreted execution, but the code generated by JIT is also improved
a bit.

As a nice side-effect it also ends up making the code a bit simpler.

A small additional optimization is removing the need to set
aggstate->curaggcontext before calling ExecAggInitGroup, choosing to
instead passign curaggcontext as an argument. It was, in contrast to
other aggregate related functions, only needed to fetch a memory
context to copy the transition value into.

Author: Andres Freund
Discussion:
   https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
   https://postgr.es/m/5c371df7cee903e8cd4c685f90c6c72086d3a2dc.camel@j-davis.com
2020-02-24 15:09:09 -08:00
Tom Lane
3d475515a1 Account explicitly for long-lived FDs that are allocated outside fd.c.
The comments in fd.c have long claimed that all file allocations should
go through that module, but in reality that's not always practical.
fd.c doesn't supply APIs for invoking some FD-producing syscalls like
pipe() or epoll_create(); and the APIs it does supply for non-virtual
FDs are mostly insistent on releasing those FDs at transaction end;
and in some cases the actual open() call is in code that can't be made
to use fd.c, such as libpq.

This has led to a situation where, in a modern server, there are likely
to be seven or so long-lived FDs per backend process that are not known
to fd.c.  Since NUM_RESERVED_FDS is only 10, that meant we had *very*
few spare FDs if max_files_per_process is >= the system ulimit and
fd.c had opened all the files it thought it safely could.  The
contrib/postgres_fdw regression test, in particular, could easily be
made to fall over by running it under a restrictive ulimit.

To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
that allow outside callers to tell fd.c that they have or want to allocate
a FD that's not directly managed by fd.c.  Add calls to track all the
fixed FDs in a standard backend session, so that we are honestly
guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
limit in a backend's idle state.  The coding rules for these functions say
that there's no need to call them in code that just allocates one FD over
a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
That means that there aren't all that many places where we need to worry.
But postgres_fdw and dblink must use this facility to account for
long-lived FDs consumed by libpq connections.  There may be other places
where it's worth doing such accounting, too, but this seems like enough
to solve the immediate problem.

Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
(Callers can choose to ignore this limit, but of course it's unwise
to do so except for fixed file allocations.)  I also reduced the limit
on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
Conceivably a smarter rule could be used here --- but in practice,
on reasonable systems, max_safe_fds should be large enough that this
isn't much of an issue, so KISS for now.  To avoid possible regression
in the number of external or allocated files that can be opened,
increase FD_MINFREE and the lower limit on max_files_per_process a
little bit; we now insist that the effective "ulimit -n" be at least 64.

This seems like pretty clearly a bug fix, but in view of the lack of
field complaints, I'll refrain from risking a back-patch.

Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org
2020-02-24 17:28:33 -05:00
Robert Haas
a91e2fa941 Adapt hashfn.c and hashutils.h for frontend use.
hash_any() and its various variants are defined to return Datum,
which is a backend-only concept, but the underlying functions
actually want to return uint32 and uint64, and only return Datum
because it's convenient for callers who are using them to
implement a hash function for some SQL datatype.

However, changing these functions to return uint32 and uint64
seems like it might lead to programming errors or back-patching
difficulties, both because they are widely used and because
failure to use UInt{32,64}GetDatum() might not provoke a
compilation error. Instead, rename the existing functions as
well as changing the return type, and add static inline wrappers
for those callers that need the previous behavior.

Although this commit adapts hashutils.h and hashfn.c so that they
can be compiled as frontend code, it does not actually do
anything that would cause them to be so compiled. That is left
for another commit.

Patch by me, reviewed by Suraj Kharage and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-24 17:27:15 +05:30
Robert Haas
9341c783cc Put all the prototypes for hashfn.c into the same header file.
Previously, some of the prototypes for functions in hashfn.c were
in utils/hashutils.h and others were in utils/hsearch.h, but that
is confusing and has no particular benefit.

Patch by me, reviewed by Suraj Kharage and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-24 17:22:45 +05:30
Robert Haas
07b95c3d83 Move bitmap_hash and bitmap_match to bitmapset.c.
The closely-related function bms_hash_value is already defined in that
file, and this change means that hashfn.c no longer needs to depend on
nodes/bitmapset.h. That gets us closer to allowing use of the hash
functions in hashfn.c in frontend code.

Patch by me, reviewed by Suraj Kharage and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-24 17:17:43 +05:30
Tom Lane
f4d59369d2 Assume that we have signed integral types and flexible array members.
These compiler features are required by C99, so remove the configure
probes for them.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:48 -05:00
Tom Lane
97cf1fa4ed Assume that we have <wchar.h>.
Windows has this, and so do all other live platforms according to the
buildfarm; it's been required by POSIX since SUSv2.  So remove the
configure probe and tests of HAVE_WCHAR_H.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:47 -05:00
Tom Lane
481c8e9232 Assume that we have utime() and <utime.h>.
These are required by POSIX since SUSv2, and no live platforms fail
to provide them.  On Windows, utime() exists and we bring our own
<utime.h>, so we're good there too.  So remove the configure probes
and ad-hoc substitute code.  We don't need to check for utimes()
anymore either, since that was only used as a substitute.

In passing, make the Windows build include <sys/utime.h> only where
we need it, not everywhere.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:47 -05:00
Tom Lane
f88a058200 Assume that we have rint().
Windows has this since _MSC_VER >= 1200, and so do all other live
platforms according to the buildfarm, so remove the configure probe
and src/port/ substitution.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:47 -05:00
Tom Lane
1200d71a09 Assume that we have memmove().
Windows has this, and so do all other live platforms according to the
buildfarm, so remove the configure probe and c.h's substitute code.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:47 -05:00
Tom Lane
abe41f453a Assume that we have cbrt().
Windows has this, and so do all other live platforms according to the
buildfarm, so remove the configure probe and float.c's substitute code.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:47 -05:00
Tom Lane
7fde892bc1 Assume that we have isinf().
Windows has this, and so do all other live platforms according to the
buildfarm, so remove the configure probe and src/port/ substitution.

This also lets us get rid of some configure probes that existed only
to support src/port/isinf.c.  I kept the port.h hack to force using
__builtin_isinf() on clang, though.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:47 -05:00
Tom Lane
799d22461a Assume that we have functional, 64-bit fseeko()/ftello().
Windows has this, and so do all other live platforms according to the
buildfarm, so remove the configure probe and src/port/ substitution.

Keep the probe that detects whether _LARGEFILE_SOURCE has to be
defined to get that, though ... that seems to be still relevant in
some places.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:47 -05:00
Peter Eisentraut
957338418b Require stdint.h
stdint.h belongs to the compiler (as opposed to inttypes.h), so by
requiring a C99 compiler we can also require stdint.h
unconditionally.  Remove configure checks and other workarounds for
it.

This also removes a few steps in the required portability adjustments
to the imported time zone code, which can be applied on the next
import.

When using GCC on a platform that is otherwise pre-C99, this will now
require at least GCC 4.5, which is the first release that supplied a
standard-conforming stdint.h if the native platform didn't have it.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/5d398bbb-262a-5fed-d839-d0e5cff3c0d7%402ndquadrant.com
2020-02-21 09:20:32 +01:00
Peter Eisentraut
2ed19a488e Set gen_random_uuid() to volatile
It was set to immutable.  This was a mistake in the initial
commit (5925e55498).

Reported-by: hubert depesz lubaczewski <depesz@depesz.com>
Discussion: https://www.postgresql.org/message-id/flat/20200218185452.GA8710%40depesz.com
2020-02-19 20:09:32 +01:00
Peter Eisentraut
c6679e4fca Optimize update of tables with generated columns
When updating a table row with generated columns, only recompute those
generated columns whose base columns have changed in this update and
keep the rest unchanged.  This can result in a significant performance
benefit.  The required information was already kept in
RangeTblEntry.extraUpdatedCols; we just have to make use of it.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com
2020-02-17 15:20:58 +01:00
Peter Eisentraut
ad3ae64770 Fill in extraUpdatedCols in logical replication
The extraUpdatedCols field of the target RTE records which generated
columns are affected by an update.  This is used in a variety of
places, including per-column triggers and foreign data wrappers.  When
an update was initiated by a logical replication subscription, this
field was not filled in, so such an update would not affect generated
columns in a way that is consistent with normal updates.  To fix,
factor out some code from analyze.c to fill in extraUpdatedCols in the
logical replication worker as well.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com
2020-02-17 15:20:57 +01:00
Fujii Masao
f4ae722141 Add description about GSSOpenServer wait event into document.
This commit also updates wait event enum into alphabetical order.
Previously the enum entry for GSSOpenServer was added out-of-order.

Back-patch to v12 where commit b0b39f72b9 introduced
GSSOpenServer wait event. In v12, the commit doesn't include
the update of wait event enum, not to break ABI.

Author: Fujii Masao
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/949931aa-4ed4-d867-a7b5-de9c02b2292b@oss.nttdata.com
2020-02-17 16:16:08 +09:00
Tom Lane
b78542b9e9 Run "make reformat-dat-files".
Mostly to make sure the previous commit didn't break this.

Discussion: https://postgr.es/m/20200212182337.GZ1412@telsasoft.com
2020-02-15 14:58:30 -05:00
Tom Lane
86ff085e83 Don't require pg_class.dat to contain correct relnatts values.
Practically everybody who's ever added a column to one of the bootstrap
catalogs has been burnt by the need to update the relnatts field in the
initial pg_class data to match.  Now that we use Perl scripts to
generate postgres.bki, we can have the machines take care of that,
by filling the field during genbki.pl.

While at it, use the BKI_DEFAULTS mechanism to eliminate repetitive
specifications of other column values in pg_class.dat, too.  They
weren't particularly a maintenance problem, but this way is prettier
(certainly the spotty previous usage of BKI_DEFAULTS wasn't pretty).

No catversion bump needed, since this doesn't actually change the
contents of postgres.bki.

Per gripe from Justin Pryzby, though this is quite different from
his originally proposed solution.

Amit Langote, John Naylor, Tom Lane

Discussion: https://postgr.es/m/20200212182337.GZ1412@telsasoft.com
2020-02-15 14:57:27 -05:00
Tom Lane
607f8ce74d Avoid a performance regression in float overflow/underflow detection.
Commit 6bf0bc842 replaced float.c's CHECKFLOATVAL() macro with static
inline subroutines, but that wasn't too well thought out.  In the original
coding, the unlikely condition (isinf(result) or result == 0) was checked
first, and the inf_is_valid or zero_is_valid condition only afterwards.
The inline-subroutine coding caused that to be swapped around, which is
pretty horrid for performance because (a) in common cases the is_valid
condition is twice as expensive to evaluate (e.g., requiring two isinf()
calls not one) and (b) in common cases the is_valid condition is false,
requiring us to perform the unlikely-condition check anyway.  Net result
is that one isinf() call becomes two or three, resulting in visible
performance loss as reported by Keisuke Kuroda.

The original fix proposal was to revert the replacement of the macro,
but on second thought, that macro was just a bad idea from the beginning:
if anything it's a net negative for readability of the code.  So instead,
let's just open-code all the overflow/underflow tests, being careful to
test the unlikely condition first (and mark it unlikely() to help the
compiler get the point).

Also, rather than having N copies of the actual ereport() calls, collapse
those into out-of-line error subroutines to save some code space.  This
does mean that the error file/line numbers won't be very helpful for
figuring out where the issue really is --- but we'd already burned that
bridge by putting the ereports into static inlines.

In HEAD, check_float[48]_val() are gone altogether.  In v12, leave them
present in float.h but unused in the core code, just in case some
extension is depending on them.

Emre Hasegeli, with some kibitzing from me and Andres Freund

Discussion: https://postgr.es/m/CANDwggLe1Gc1OrRqvPfGE=kM9K0FSfia0hbeFCEmwabhLz95AA@mail.gmail.com
2020-02-13 13:37:43 -05:00
Peter Eisentraut
b691c189c6 Simplify passing of configure arguments to pg_config
The previous system had configure put the value into the makefiles and
then have the makefiles pass them to the build of pg_config.  That was
put in place when pg_config was a shell script.  We can simplify that
by having configure put the value into pg_config.h directly.  This
also makes the standard build system match how the MSVC build system
already does it.

Discussion: https://www.postgresql.org/message-id/flat/6e457870-cef5-5f1d-b57c-fc89cfb8a788%402ndquadrant.com
2020-02-10 19:23:41 +01:00
Jeff Davis
11de6c903d Change signature of TupleHashTableHash().
Commit 4eaea3db introduced TupleHashTableHash(), but the signature
didn't match the other exposed functions. Separate it into internal
and external versions. The external version hides the details behind
an API more consistent with the other external functions, and the
internal version is still suitable for simplehash.
2020-02-10 10:20:10 -08:00
Amit Kapila
3dfba9fdf5 Fix typos.
Reported-by: Justin Pryzby
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200206021432.GA24549@telsasoft.com
2020-02-10 09:31:18 +05:30
Fujii Masao
cb5b28613d Fix bug in Tid scan.
Commit 147e3722f7 changed Tid scan so that it calls table_beginscan()
and uses the scan option for seq scan. This change caused two issues.

(1) The change caused Tid scan to take a predicate lock on the entire
       relation in serializable transaction even when relation-level
       lock is not necessary. This could lead to an unexpected
       serialization error.

(2) The change caused Tid scan to increment the number of seq_scan
       in pg_stat_*_tables views even though it's not seq scan. This
       could confuse the users.

This commit adds the scan option for Tid scan and makes Tid scan
use it, to avoid those issues.

Back-patch to v12, where the bug was introduced.

Author: Tatsuhito Kasahara
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/CAP0=ZVKy+gTbFmB6X_UW0pP3WaeJ-fkUWHoD-pExS=at3CY76g@mail.gmail.com
2020-02-07 22:06:31 +09:00
Andres Freund
b059d2f456 jit: Reference expression step functions via llvmjit_types.
The main benefit of doing so is that this allows llvm to ensure that
types match - previously that'd only be detected by a crash within the
called function. There were a number of cases where we passed a
superfluous parameter...

To avoid needing to add all the functions to llvmjit.{c,h}, instead
get them from the llvm module for llvmjit_types.c. Also use that for
the functions from llvmjit_types already in llvmjit.h.

Author: Soumyadeep Chakraborty and Andres Freund
Discussion: https://postgr.es/m/CADwEdooww3wZv-sXSfatzFRwMuwa186LyTwkBfwEW6NjtooBPA@mail.gmail.com
2020-02-06 22:29:14 -08:00
Michael Paquier
c4f3b63cab Bump catalog version for the addition of leader_pid in pg_stat_activity
Oversight in commit b025f32.

Per private report from Julien Rouhaud.
2020-02-07 15:08:17 +09:00
Jeff Davis
4eaea3db15 Introduce TupleHashTableHash() and LookupTupleHashEntryHash().
Expose two new entry points: one for only calculating the hash value
of a tuple, and another for looking up a hash entry when the hash
value is already known. This will be useful for disk-based Hash
Aggregation to avoid recomputing the hash value for the same tuple
after saving and restoring it from disk.

Discussion: https://postgr.es/m/37091115219dd522fd9ed67333ee8ed1b7e09443.camel%40j-davis.com
2020-02-06 20:34:01 -08:00
Andres Freund
1fdb7f9789 expression eval: Don't redundantly keep track of AggState.
It's already tracked via ExprState->parent, so we don't need to also
include it in ExprEvalStep. When that code originally was written
ExprState->parent didn't exist, but it since has been introduced in
6719b238e8.

Author: Andres Freund
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
2020-02-06 19:54:43 -08:00
Jeff Davis
7d4395d0a1 Refactor hash_agg_entry_size().
Consolidate the calculations for hash table size estimation. This will
help with upcoming Hash Aggregation work that will add additional call
sites.
2020-02-06 11:49:56 -08:00
Michael Paquier
b025f32e0b Add leader_pid to pg_stat_activity
This new field tracks the PID of the group leader used with parallel
query.  For parallel workers and the leader, the value is set to the
PID of the group leader.  So, for the group leader, the value is the
same as its own PID.  Note that this reflects what PGPROC stores in
shared memory, so as leader_pid is NULL if a backend has never been
involved in parallel query.  If the backend is using parallel query or
has used it at least once, the value is set until the backend exits.

Author: Julien Rouhaud
Reviewed-by: Sergei Kornilov, Guillaume Lelarge, Michael Paquier, Tomas
Vondra
Discussion: https://postgr.es/m/CAOBaU_Yy5bt0vTPZ2_LUM6cUcGeqmYNoJ8-Rgto+c2+w3defYA@mail.gmail.com
2020-02-06 09:18:06 +09:00
Alvaro Herrera
15d13e8291 Make vacuum buffer counters 64 bits wide
Using 32 bit counters means they can now realistically wrap around when
vacuuming extremely large tables.  Because they're signed integers,
stats printed by vacuum look very odd when they do.

We'd love to backpatch this, but refrain because the variables are
exported and could cause third-party code to break.

Reviewed-by: Julien Rouhaud, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/20200131205926.GA16367@alvherre.pgsql
2020-02-05 16:59:29 -03:00
Thomas Munro
815c2f0972 Add kqueue(2) support to the WaitEventSet API.
Use kevent(2) to wait for events on the BSD family of operating
systems and macOS.  This is similar to the epoll(2) support added
for Linux by commit 98a64d0bd.

Author: Thomas Munro
Reviewed-by: Andres Freund, Marko Tiikkaja, Tom Lane
Tested-by: Mateusz Guzik, Matteo Beccati, Keith Fiske, Heikki Linnakangas, Michael Paquier, Peter Eisentraut, Rui DeSousa, Tom Lane, Mark Wong
Discussion: https://postgr.es/m/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com
2020-02-05 17:35:57 +13:00
Michael Paquier
f1f10a1ba9 Add declaration-level assertions for compile-time checks
Those new assertions can be used at file scope, outside of any function
for compilation checks.  This commit provides implementations for C and
C++, and fallback implementations.

Author: Peter Smith
Reviewed-by: Andres Freund, Kyotaro Horiguchi, Dagfinn Ilmari Mannsåker,
Michael Paquier
Discussion: https://postgr.es/m/201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217
2020-02-03 14:48:42 +09:00
Andrew Gierth
1fd687a035 Optimizations for integer to decimal output.
Using a lookup table of digit pairs reduces the number of divisions
needed, and calculating the length upfront saves some work; these
ideas are taken from the code previously committed for floats.

David Fetter, reviewed by Kyotaro Horiguchi, Tels, and me.

Discussion: https://postgr.es/m/20190924052620.GP31596%40fetter.org
2020-02-01 21:57:14 +00:00
Tom Lane
74b35eb468 Fix CheckAttributeType's handling of collations for ranges.
Commit fc7695891 changed CheckAttributeType to recurse into ranges,
but made it pass down the wrong collation (always InvalidOid, since
ranges as such have no collation).  This would result in guaranteed
failure when considering a range type whose subtype is collatable.

Embarrassingly, we lack any regression tests that would expose such
a problem (but fortunately, somebody noticed before we shipped this
bug in any release).

Fix it to pass down the range's subtype collation property instead,
and add some regression test cases to exercise collatable-subtype
ranges a bit more.  Back-patch to all supported branches, as the
previous patch was.

Report and patch by Julien Rouhaud, test cases tweaked by me

Discussion: https://postgr.es/m/CAOBaU_aBWqNweiGUFX0guzBKkcfJ8mnnyyGC_KBQmO12Mj5f_A@mail.gmail.com
2020-01-31 17:03:55 -05:00
Peter Eisentraut
a9cff89f7e Allow building without default socket directory
We have code paths for Unix socket support and no Unix socket support.
Now add a third variant: Unix socket support but do not use a Unix
socket by default in the client or the server, only if you explicitly
specify one.  This will be useful when we enable Unix socket support
on Windows.

To implement this, tweak things so that setting DEFAULT_PGSOCKET_DIR
to "" has the desired effect.  This mostly already worked like that;
only a few places needed to be adjusted.  Notably, the reference to
DEFAULT_PGSOCKET_DIR in UNIXSOCK_PATH() could be removed because all
callers already resolve an empty socket directory setting with a
default if appropriate.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/75f72249-8ae6-322a-63df-4fe03eeccb9f@2ndquadrant.com
2020-01-31 16:28:43 +01:00
Peter Eisentraut
7c23bfd25c Sprinkle some const decorations
This might help clarify the API a bit.
2020-01-31 12:52:08 +01:00
Tom Lane
50fc694e43 Invent "trusted" extensions, and remove the pg_pltemplate catalog.
This patch creates a new extension property, "trusted".  An extension
that's marked that way in its control file can be installed by a
non-superuser who has the CREATE privilege on the current database,
even if the extension contains objects that normally would have to be
created by a superuser.  The objects within the extension will (by
default) be owned by the bootstrap superuser, but the extension itself
will be owned by the calling user.  This allows replicating the old
behavior around trusted procedural languages, without all the
special-case logic in CREATE LANGUAGE.  We have, however, chosen to
loosen the rules slightly: formerly, only a database owner could take
advantage of the special case that allowed installation of a trusted
language, but now anyone who has CREATE privilege can do so.

Having done that, we can delete the pg_pltemplate catalog, moving the
knowledge it contained into the extension script files for the various
PLs.  This ends up being no change at all for the in-core PLs, but it is
a large step forward for external PLs: they can now have the same ease
of installation as core PLs do.  The old "trusted PL" behavior was only
available to PLs that had entries in pg_pltemplate, but now any
extension can be marked trusted if appropriate.

This also removes one of the stumbling blocks for our Python 2 -> 3
migration, since the association of "plpythonu" with Python 2 is no
longer hard-wired into pg_pltemplate's initial contents.  Exactly where
we go from here on that front remains to be settled, but one problem
is fixed.

Patch by me, reviewed by Peter Eisentraut, Stephen Frost, and others.

Discussion: https://postgr.es/m/5889.1566415762@sss.pgh.pa.us
2020-01-29 18:42:43 -05:00
Robert Haas
beb4699091 Move jsonapi.c and jsonapi.h to src/common.
To make this work, (1) makeJsonLexContextCstringLen now takes the
encoding to be used as an argument; (2) check_stack_depth() is made to
do nothing in frontend code, and (3) elog(ERROR, ...) is changed to
pg_log_fatal + exit in frontend code.

Mark Dilger, reviewed and slightly revised by me.

Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
2020-01-29 10:22:51 -05:00
Thomas Munro
6f38d4dac3 Remove dependency on HeapTuple from predicate locking functions.
The following changes make the predicate locking functions more
generic and suitable for use by future access methods:

- PredicateLockTuple() is renamed to PredicateLockTID().  It takes
  ItemPointer and inserting transaction ID instead of HeapTuple.

- CheckForSerializableConflictIn() takes blocknum instead of buffer.

- CheckForSerializableConflictOut() no longer takes HeapTuple or buffer.

Author: Ashwin Agrawal
Reviewed-by: Andres Freund, Kuntal Ghosh, Thomas Munro
Discussion: https://postgr.es/m/CALfoeiv0k3hkEb3Oqk%3DziWqtyk2Jys1UOK5hwRBNeANT_yX%2Bng%40mail.gmail.com
2020-01-28 13:13:04 +13:00
Robert Haas
73ce2a03f3 Move some code from jsonapi.c to jsonfuncs.c.
Specifically, move those functions that depend on ereport()
from jsonapi.c to jsonfuncs.c, in preparation for allowing
jsonapi.c to be used from frontend code.

A few cases where elog(ERROR, ...) is used for can't-happen
conditions are left alone; we can handle those in some other
way in frontend code.

Reviewed by Mark Dilger and Andrew Dunstan.

Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
2020-01-27 11:22:13 -05:00
Robert Haas
1f3a021730 Adjust pg_parse_json() so that it does not directly ereport().
Instead, it now returns a value indicating either success or the
type of error which occurred. The old behavior is still available
by calling pg_parse_json_or_ereport(). If the new interface is
used, an error can be thrown by passing the return value of
pg_parse_json() to json_ereport_error().

pg_parse_json() can still elog() in can't-happen cases, but it
seems like that issue is best handled separately.

Adjust json_lex() and json_count_array_elements() to return an
error code, too.

This is all in preparation for making the backend's json parser
available to frontend code.

Reviewed and/or tested by Mark Dilger and Andrew Dunstan.

Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
2020-01-27 11:04:51 -05:00
Heikki Linnakangas
38a957316d Refactor XLogReadRecord(), adding XLogBeginRead() function.
The signature of XLogReadRecord() required the caller to pass the starting
WAL position as argument, or InvalidXLogRecPtr to continue reading at the
end of previous record. That's slightly awkward to the callers, as most
of them don't want to randomly jump around in the WAL stream, but start
reading at one position and then read everything from that point onwards.
Remove the 'RecPtr' argument and add a new function XLogBeginRead() to
specify the starting position instead. That's more convenient for the
callers. Also, xlogreader holds state that is reset when you change the
starting position, so having a separate function for doing that feels like
a more natural fit.

This changes XLogFindNextRecord() function so that it doesn't reset the
xlogreader's state to what it was before the call anymore. Instead, it
positions the xlogreader to the found record, like XLogBeginRead().

Reviewed-by: Kyotaro Horiguchi, Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/5382a7a3-debe-be31-c860-cb810c08f366%40iki.fi
2020-01-26 11:39:00 +02:00
Tom Lane
1001368497 Clean up EXPLAIN's handling of per-worker details.
Previously, it was possible for EXPLAIN ANALYZE of a parallel query
to produce several different "Workers" fields for a single plan node,
because different portions of explain.c independently generated
per-worker data and wrapped that output in separate fields.  This
is pretty bogus, especially for the structured output formats: even
if it's not technically illegal, most programs would have a hard time
dealing with such data.

To improve matters, add infrastructure that allows redirecting
per-worker values into a side data structure, and then collect that
data into a single "Workers" field after we've finished running all
the relevant code for a given plan node.

There are a few visible side-effects:

* In text format, instead of something like

  Sort Method: external merge  Disk: 4920kB
  Worker 0:  Sort Method: external merge  Disk: 5880kB
  Worker 1:  Sort Method: external merge  Disk: 5920kB
  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
  Worker 0:  actual time=130.058..130.324 rows=1324 loops=1
    Buffers: shared hit=337 read=3489, temp read=505 written=739
  Worker 1:  actual time=130.273..130.512 rows=1297 loops=1
    Buffers: shared hit=345 read=3507, temp read=505 written=744

you get

  Sort Method: external merge  Disk: 4920kB
  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
  Worker 0:  actual time=130.058..130.324 rows=1324 loops=1
    Sort Method: external merge  Disk: 5880kB
    Buffers: shared hit=337 read=3489, temp read=505 written=739
  Worker 1:  actual time=130.273..130.512 rows=1297 loops=1
    Sort Method: external merge  Disk: 5920kB
    Buffers: shared hit=345 read=3507, temp read=505 written=744

* When JIT is enabled, any relevant per-worker JIT stats are attached
to the child node of the Gather or Gather Merge node, which is where
the other per-worker output has always been.  Previously, that info
was attached directly to a Gather node, or missed entirely for Gather
Merge.

* A query's summary JIT data no longer includes a bogus
"Worker Number: -1" field.

A notable code-level change is that indenting for lines of text-format
output should now be handled by calling "ExplainIndentText(es)",
instead of hard-wiring how much space to emit.  This seems a good deal
cleaner anyway.

This patch also adds a new "explain.sql" regression test script that's
dedicated to testing EXPLAIN.  There is more that can be done in that
line, certainly, but for now it just adds some coverage of the XML and
YAML output formats, which had been completely untested.

Although this is surely a bug fix, it's not clear that people would
be happy with rearranging EXPLAIN output in a minor release, so apply
to HEAD only.

Maciek Sakrejda and Tom Lane, based on an idea of Andres Freund's;
reviewed by Georgios Kokolatos

Discussion: https://postgr.es/m/CAOtHd0AvAA8CLB9Xz0wnxu1U=zJCKrr1r4QwwXi_kcQsHDVU=Q@mail.gmail.com
2020-01-25 18:16:42 -05:00
Dean Rasheed
13661ddd7e Add functions gcd() and lcm() for integer and numeric types.
These compute the greatest common divisor and least common multiple of
a pair of numbers using the Euclidean algorithm.

Vik Fearing, reviewed by Fabien Coelho.

Discussion: https://postgr.es/m/adbd3e0b-e3f1-5bbc-21db-03caf1cef0f7@2ndquadrant.com
2020-01-25 14:00:59 +00:00
Robert Haas
11b5e3e35d Split JSON lexer/parser from 'json' data type support.
Keep the code that pertains to the 'json' data type in json.c, but
move the lexing and parsing code to a new file jsonapi.c, a name
I chose because the corresponding prototypes are in jsonapi.h.

This seems like a logical division, because the JSON lexer and parser
are also used by the 'jsonb' data type, but the SQL-callable functions
in json.c are a separate thing. Also, the new jsonapi.c file needs to
include far fewer header files than json.c, which seems like a good
sign that this is an appropriate place to insert an abstraction
boundary. I took the opportunity to remove a few apparently-unneeded
includes from json.c at the same time.

Patch by me, reviewed by David Steele, Mark Dilger, and Andrew
Dunstan. The previous commit was, too, but I forgot to note it
in the commit message.

Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
2020-01-24 10:17:43 -08:00
Robert Haas
ce0425b162 Adjust src/include/utils/jsonapi.h so it's not backend-only.
The major change here is that we no longer include jsonb.h into
jsonapi.h. The reason that was necessary is that jsonapi.h included
several prototypes functions in jsonfuncs.c that depend on the Jsonb
type. Move those prototypes to a new header, jsonfuncs.h, and include
it where needed.

The other change is that JsonEncodeDateTime is now declared in
json.h rather than jsonapi.h.

Taken together, these steps eliminate all dependencies of jsonapi.h
on backend-only data types and header files, so that it can
potentially be included in frontend code.
2020-01-24 09:58:37 -08:00