"pg_dump -Ft -f filename ..." got broken by my recent commit
4317e0246c, which I fear I only tested
in the output-to-stdout variant.
Report and fix by Muhammad Asif Naeem.
This is currently only cosmetic, since all the call sites just curl up
and die in event of a failure return. It might be important for some
future use-case, though, and in any case it quiets warnings from the
clang static analyzer (as reported by Anna Zaks).
Josh Kupershmidt
The initial implementation of pg_dump's --section option supposed that the
existing --schema-only and --data-only options could be made equivalent to
--section settings. This is wrong, though, due to dubious but long since
set-in-stone decisions about where to dump SEQUENCE SET items, as seen in
bug report from Martin Pitt. (And I'm not totally convinced there weren't
other bugs, either.) Undo that coupling and instead drive --section
filtering off current-section state tracked as we scan through the TOC
list to call _tocEntryRequired().
To make sure those decisions don't shift around and hopefully save a few
cycles, run _tocEntryRequired() only once per TOC entry and save the result
in a new TOC field. This required minor rejiggering of ACL handling but
also allows a far cleaner implementation of inhibit_data_for_failed_table.
Also, to ensure that pg_dump and pg_restore have the same behavior with
respect to the --section switches, add _tocEntryRequired() filtering to
WriteToc() and WriteDataChunks(), rather than trying to implement section
filtering in an entirely orthogonal way in dumpDumpableObject(). This
required adjusting the handling of the special ENCODING and STDSTRINGS
items, but they were pretty weird before anyway.
Minor other code review for the patch, too.
This patch fixes three places (which AFAICT is all of them) where runtime
was O(N^2) in the number of TOC entries, by using an index array to replace
linear searches of the TOC list. This performance issue is a bit less bad
than those recently fixed, because it depends on the number of items dumped
not the number in the source database, so the problem can be dodged by
doing partial dumps.
The previous coding already had an instance of one of the two index arrays
needed, but it was only calculated in parallel-restore cases; now we need
it all the time. I also chose to move the arrays into the ArchiveHandle
data structure, to make this code a bit more ready for the day that we
try to sling multiple ArchiveHandles around in pg_dump or pg_restore.
Since we still need some server-side work before pg_dump can really cope
nicely with tens of thousands of tables, there's probably little point in
back-patching.
The previous coding presented a significant bottleneck when dumping
databases containing many thousands of schemas, since the total time
spent searching would increase roughly as O(N^2) in the number of objects.
Noted by Jeff Janes, though I rewrote his proposed patch to use the
existing findObjectByOid infrastructure.
Since this is a longstanding performance bug, backpatch to all supported
versions.
The original syntax wasn't universally loved, and it didn't allow its
usage in CREATE TABLE, only ALTER TABLE. It now works everywhere, and
it also allows using ALTER TABLE ONLY to add an uninherited CHECK
constraint, per discussion.
The pg_constraint column has accordingly been renamed connoinherit.
This commit partly reverts some of the changes in
61d81bd28d, particularly some pg_dump and
psql bits, because now pg_get_constraintdef includes the necessary NO
INHERIT within the constraint definition.
Author: Nikhil Sontakke
Some tweaks by me
Somebody didn't bother to fix this comment while adding foreign table
support to the code below it.
In passing, remove the explicit calling-out of relkind letters, which adds
complexity to the comment but doesn't help in understanding the code.
Combining the loop workspace with the record of already-processed objects
might have been a cute trick, but it behaves horridly if there are many
dependency loops to repair: the time spent in the first step of findLoop()
grows as O(N^2). Instead use a separate flag array indexed by dump ID,
which we can check in constant time. The length of the workspace array
is now never more than the actual length of a dependency chain, which
should be reasonably short in all cases of practical interest. The code
is noticeably easier to understand this way, too.
Per gripe from Mike Roest. Since this is a longstanding performance bug,
backpatch to all supported versions.
The loop that matched owned sequences to their owning tables required time
proportional to number of owned sequences times number of tables; although
this work was only expended in selective-dump situations, which is probably
why the issue wasn't recognized long since. Refactor slightly so that we
can perform this work after the index array for findTableByOid has been
set up, reducing the time to O(M log N).
Per gripe from Mike Roest. Since this is a longstanding performance bug,
backpatch to all supported versions.
ecpg and pg_dump each contain keyword arrays with structure similar
to the backend's keyword array. Up to now, we actually named those
arrays the same as the backend's and relied on parser/keywords.h
to declare them. This seems a tad too cute, though, and it breaks
now that we need to PGDLLIMPORT-decorate the backend symbols.
Rename to avoid the problem. Per buildfarm.
(It strikes me that maybe we should get rid of the separate keywords.c
files altogether, and just define these arrays in the modules that use
them, but that's a rather more invasive change.)
Some of these are newly added, some are older and were forgotten, some
don't contain any translatable strings right now but look like they
could in the future.
The old code was using exit_horribly or die_horribly other depending on
whether it had an ArchiveHandle on which to close the connection or not;
but there were places that were passing a NULL ArchiveHandle to
die_horribly, and other places that used exit_horribly while having an
AH available. So there wasn't all that much consistency.
Improve the situation by keeping only one of the routines, and instead
of having to pass the AH down from the caller, arrange for it to be
present for an on_exit_nicely callback to operate on.
Author: Joachim Wieland
Some tweaks by me
Per a suggestion from Robert Haas, in the ongoing "parallel pg_dump"
saga.
This was for demonstration only, and now it was creating compiler
warnings from zlib without an obvious fix (see also
d923125b77), let's just remove it. The
"directory" format is presumably similar enough anyway.
Although we often don't care about freeing all memory in pg_dump,
these functions already freed the same memory in other code paths, so
we might as well do it consistently.
found by Coverity
gzFile is already a pointer, so code like
gzFile *handle = gzopen(...)
is wrong.
This used to pass silently because gzFile used to be defined as void*,
and you can assign a void* to a void**. But somewhere between zlib
versions 1.2.3.4 and 1.2.6, the definition of gzFile was changed to
struct gzFile_s *, and with that new definition this usage causes
compiler warnings.
So remove all those extra pointer decorations.
There is a related issue in pg_backup_archiver.h, where
FILE *FH; /* General purpose file handle */
is used throughout pg_dump as sometimes a real FILE* and sometimes a
gzFile handle, which also causes warnings now. This is not yet fixed
here, because it might need more code restructuring.
pg_dump was incautious about sanitizing object names that are emitted
within SQL comments in its output script. A name containing a newline
would at least render the script syntactically incorrect. Maliciously
crafted object names could present a SQL injection risk when the script
is reloaded.
Reported by Heikki Linnakangas, patch by Robert Haas
Security: CVE-2012-0868
Use exit_horribly() and ExecuteSqlQueryForSingleRow() in various
places where it's equivalent, or nearly equivalent, to the prior
coding. Apart from being more compact, this also makes the error
messages for the wrong-number-of-tuples case more consistent.
Parallel pg_dump wants to have multiple ArchiveHandle objects, and
therefore multiple PGconns, in play at the same time. This should
be just about the end of the refactoring that we need in order to
make that workable.
Any patches apt to get broken have probably already been broken by the
error-handling cleanups I just did, so we might as well clean this up
at the same time.
We don't normally allow quals to be pushed down into a view created
with the security_barrier option, but functions without side effects
are an exception: they're OK. This allows much better performance in
common cases, such as when using an equality operator (that might
even be indexable).
There is an outstanding issue here with the CREATE FUNCTION / ALTER
FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the
leakproof flag. But I'm committing this as-is so that it doesn't
have to be rebased again; we can fix up the grammar in a future
commit.
KaiGai Kohei, with some wordsmithing by me.
If an extension has not been selected to be dumped (perhaps because of
a --schema or --table switch), the contents of its configuration tables
surely should not get dumped either. Per gripe from
Hubert Depesz Lubaczewski.
In pre-7.3 databases, pg_attribute.attislocal doesn't exist. The easiest
way to make sure the new inheritance logic behaves sanely is to assume it's
TRUE, not FALSE. This will result in printing child columns even when
they're not really needed. We could work harder at trying to reconstruct a
value for attislocal, but there is little evidence that anyone still cares
about dumping from such old versions, so just do the minimum necessary to
have a valid dump.
I had this correct in the original draft of the patch, but for some
unaccountable reason decided it wasn't necessary to change the value.
Testing against an old server shows otherwise...
Revise pg_dump's handling of inherited columns, which was last looked at
seriously in 2001, to eliminate several misbehaviors associated with
inherited default expressions and NOT NULL flags. In particular make sure
that a column is printed in a child table's CREATE TABLE command if and
only if it has attislocal = true; the former behavior would sometimes cause
a column to become marked attislocal when it was not so marked in the
source database. Also, stop relying on textual comparison of default
expressions to decide if they're inherited; instead, don't use
default-expression inheritance at all, but just install the default
explicitly at each level of the hierarchy. This fixes the
search-path-related misbehavior recently exhibited by Chester Young, and
also removes some dubious assumptions about the order in which ALTER TABLE
SET DEFAULT commands would be executed.
Back-patch to all supported branches.
Various filters that were meant to prevent dumping of table data were not
being applied to extension config tables, notably --exclude-table-data and
--no-unlogged-table-data. We also would bogusly try to dump data from
views, sequences, or foreign tables, should an extension try to claim they
were config tables. Fix all that, and refactor/redocument to try to make
this a bit less fragile. This reverts the implementation, though not the
feature, of commit 7b070e896c, which had
broken config-table dumping altogether :-(.
It is still the case that the code will dump config-table data even if
--schema is specified. That behavior was intentional, as per the comments
in getExtensionMembership, so I think it requires some more discussion
before we change it.
This is another round of refactoring to make things simpler for parallel
pg_dump. pg_dump.c now issues SQL queries through the relevant Archive
object, rather than relying on the global variable g_conn. This commit
isn't quite enough to get rid of g_conn entirely, but it makes a big
dent in its utilization and, along the way, manages to be slightly less
code than before.
Instead, everything that needs the Archive object now gets it as a
parameter. This is necessary infrastructure for parallel pg_dump,
but is also amply justified by the ugliness of the current code
(though a lot more than this is needed to fix that problem).
Change various places in the code that are referencing the global
Archive object g_fout to instead reference the Archive object fout
which is already being passed as a parameter. For parallel pg_dump to
work, we're going to need multiple Archive(Handle) objects, so the
real solution here is to pass down the Archive object to everywhere
that it needs to go, but we might as well pick the low-hanging fruit
first.
Parallel dump will need to repeat these steps for each new connection,
so it's better to have this logic in its own function.
Extracted (with some changes) from a much larger patch
by Joachim Wieland.
In commit 6545a901aa, I removed the mini SQL
lexer that was in pg_backup_db.c, thinking that it had no real purpose
beyond separating COPY data from SQL commands, which purpose had been
obsoleted by long-ago fixes in pg_dump's archive file format.
Unfortunately this was in error: that code was also used to identify
command boundaries in INSERT-style table data, which is run together as a
single string in the archive file for better compressibility. As a result,
direct-to-database restores from archive files made with --inserts or
--column-inserts fail in our latest releases, as reported by Dick Visser.
To fix, restore the mini SQL lexer, but simplify it by adjusting the
calling logic so that it's only required to cope with INSERT-style table
data, not arbitrary SQL commands. This allows us to not have to deal with
SQL comments, E'' strings, or dollar-quoted strings, none of which have
ever been emitted by dumpTableData_insert.
Also, fix the lexer to cope with standard-conforming strings, which was the
actual bug that the previous patch was meant to solve.
Back-patch to all supported branches. The previous patch went back to 8.2,
which unfortunately means that the EOL release of 8.2 contains this bug,
but I don't think we're doing another 8.2 release just because of that.
pg_dump sorts operators by name, but operators with the same name come
out in random order. Now operators with the same name are dumped in
the order prefix, postfix, infix. (This is consistent with functions,
which are dumped in increasing number of argument order.)