Invent new RawParseModes that allow the core grammar to handle
pl/pgsql expressions and assignments directly, and thereby get rid
of a lot of hackery in pl/pgsql's parser. This moves a good deal
of knowledge about pl/pgsql into the core code: notably, we have to
invent a CoercionContext that matches pl/pgsql's (rather dubious)
historical behavior for assignment coercions. That's getting away
from the original idea of pl/pgsql as an arm's-length extension of
the core, but really we crossed that bridge a long time ago.
The main advantage of doing this is that we can now use the core
parser to generate FieldStore and/or SubscriptingRef nodes to handle
assignments to pl/pgsql variables that are records or arrays. That
fixes a number of cases that had never been implemented in pl/pgsql
assignment, such as nested records and array slicing, and it allows
pl/pgsql assignment to support the datatype-specific subscripting
behaviors introduced in commit c7aba7c14.
There are cosmetic benefits too: when a syntax error occurs in a
pl/pgsql expression, the error report no longer includes the confusing
"SELECT" keyword that used to get prefixed to the expression text.
Also, there seem to be some small speed gains.
Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
This patch essentially allows gram.y to implement a family of related
syntax trees, rather than necessarily always parsing a list of SQL
statements. raw_parser() gains a new argument, enum RawParseMode,
to say what to do. As proof of concept, add a mode that just parses
a TypeName without any other decoration, and use that to greatly
simplify typeStringToTypeName().
In addition, invent a new SPI entry point SPI_prepare_extended() to
allow SPI users (particularly plpgsql) to get at this new functionality.
In hopes of making this the last variant of SPI_prepare(), set up its
additional arguments as a struct rather than direct arguments, and
promise that future additions to the struct can default to zero.
SPI_prepare_cursor() and SPI_prepare_params() can perhaps go away at
some point.
Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
These were broken in multiple ways:
* The xbstart and xhstart lexer actions neglected to set
"state_before_str_start" before transitioning to the xb/xh states,
thus possibly resulting in "internal error: unreachable state" later.
* The test for valid string contents at the end of xb state was flat out
wrong, as it accounted incorrectly for the "b" prefix that the xbstart
action had injected. Meanwhile, the xh state had no such check at all.
* The generated literal value failed to include any quote marks.
* The grammar did the wrong thing anyway, typically ignoring the
literal value and emitting something else, since BCONST and XCONST
tokens were handled randomly differently from SCONST tokens.
The first of these problems is evidently an oversight in commit
7f380c59f, but the others seem to be very ancient. The lack of
complaints shows that ECPG users aren't using these syntaxes much
(although I do vaguely remember one previous complaint).
As written, this patch is dependent on 7f380c59f, so it can't go
back further than v13. Given the shortage of complaints, I'm not
excited about adapting the patch to prior branches.
Report and patch by Shenhao Wang (test case adjusted by me)
Discussion: https://postgr.es/m/d6402f1bacb74ecba22ef715dbba17fd@G08CNEXMBPEKD06.g08.fujitsu.local
ECPG's PREPARE ... FROM and EXECUTE IMMEDIATE can optionally take
the target query as a simple literal, rather than the more usual
string-variable reference. This was previously documented as
being a C string literal, but that's a lie in one critical respect:
you can't write a data double quote as \" in such literals. That's
because the lexer is in SQL mode at this point, so it'll parse
double-quoted strings as SQL identifiers, within which backslash
is not special, so \" ends the literal.
I looked into making this work as documented, but getting the lexer
to switch behaviors at just the right point is somewhere between
very difficult and impossible. It's not really worth the trouble,
because these cases are next to useless: if you have a fixed SQL
statement to execute or prepare, you might as well write it as
a direct EXEC SQL, saving the messiness of converting it into
a string literal and gaining the opportunity for compile-time
SQL syntax checking.
Instead, let's just document (and test) the workaround of writing
a double quote as an octal escape (\042) in such cases.
There's no code behavioral change here, so in principle this could
be back-patched, but it's such a niche case I doubt it's worth
the trouble.
Per report from 1250kv.
Discussion: https://postgr.es/m/673825.1603223178@sss.pgh.pa.us
If you write the literal 'abc''def' in an EXEC SQL command, that will
come out the other end as 'abc'def', triggering a syntax error in the
backend. Likewise, "abc""def" is reduced to "abc"def" which is wrong
syntax for a quoted identifier.
The cause is that the lexer thinks it should emit just one quote
mark, whereas what it really should do is keep the string as-is.
Add some docs and test cases, too.
Although this seems clearly a bug, I fear users wouldn't appreciate
changing it in minor releases. Some may well be working around it
by applying an extra doubling of affected quotes, as for example
sql/dyntest.pgc has been doing.
Per investigation of a report from 1250kv, although this isn't
exactly what he/she was on about.
Discussion: https://postgr.es/m/673825.1603223178@sss.pgh.pa.us
Up to now, if you tried to omit "AS" before a column label in a SELECT
list, it would only work if the column label was an IDENT, that is not
any known keyword. This is rather unfriendly considering that we have
so many keywords and are constantly growing more. In the wake of commit
1ed6b8956 it's possible to improve matters quite a bit.
We'd originally tried to make this work by having some of the existing
keyword categories be allowed without AS, but that didn't work too well,
because each category contains a few special cases that don't work
without AS. Instead, invent an entirely orthogonal keyword property
"can be bare column label", and mark all keywords that way for which
we don't get shift/reduce errors by doing so.
It turns out that of our 450 current keywords, all but 39 can be made
bare column labels, improving the situation by over 90%. This number
might move around a little depending on future grammar work, but it's
a pretty nice improvement.
Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor
Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
This ought to work much like C's "#elif defined(name)"; but the code
implemented it in a way equivalent to endif followed by ifdef, so that
it didn't matter whether any previous branch of the IF construct had
succeeded. Fix that; add some test cases covering elif and nested IFs;
and improve the documentation, which also seemed a bit confused.
AFAICS the code has been like this since the feature was added in 1999
(commit b57b0e044). So while it's surely wrong, there might be code
out there relying on the current behavior. Hence, don't back-patch
into stable branches. It seems all right to fix it in v13 though.
Per report from Ashutosh Sharma. Reviewed by Ashutosh Sharma and
Michael Meskes.
Discussion: https://postgr.es/m/CAE9k0P=dQk9X0cU2tN49S7a9tv733-e1pVdpB1P-pWJ5PdTktg@mail.gmail.com
access_method, database_name, and index_name are all just name, and
they are not used consistently for their alleged purpose, so remove
them. They have been around since ancient times but have no current
reason for existing. Removing them can simplify future grammar
refactoring.
Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
Previously, the core scanner's yy_transition[] array had 37045 elements.
Since that number is larger than INT16_MAX, Flex generated the array to
contain 32-bit integers. By reimplementing some of the bulkier scanner
rules, this patch reduces the array to 20495 elements. The much smaller
total length, combined with the consequent use of 16-bit integers for
the array elements reduces the binary size by over 200kB. This was
accomplished in two ways:
1. Consolidate handling of quote continuations into a new start condition,
rather than duplicating that logic for five different string types.
2. Treat Unicode strings and identifiers followed by a UESCAPE sequence
as three separate tokens, rather than one. The logic to de-escape
Unicode strings is moved to the filter code in parser.c, which already
had the ability to provide special processing for token sequences.
While we could have implemented the conversion in the grammar, that
approach was rejected for performance and maintainability reasons.
Performance in microbenchmarks of raw parsing seems equal or slightly
faster in most cases, and it's reasonable to expect that in real-world
usage (with more competition for the CPU cache) there will be a larger
win. The exception is UESCAPE sequences; lexing those is about 10%
slower, primarily because the scanner now has to be called three times
rather than one. This seems acceptable since that feature is very
rarely used.
The psql and epcg lexers are likewise modified, primarily because we
want to keep them all in sync. Since those lexers don't use the
space-hogging -CF option, the space savings is much less, but it's
still good for perhaps 10kB apiece.
While at it, merge the ecpg lexer's handling of C-style comments used
in SQL and in C. Those have different rules regarding nested comments,
but since we already have the ability to keep track of the previous
start condition, we can use that to handle both cases within a single
start condition. This matches the core scanner more closely.
John Naylor
Discussion: https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.
By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.
Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
This reverts commit bd7c95f0c1,
along with assorted follow-on fixes. There are some questions
about the definition and implementation of that statement, and
we don't have time to resolve them before v13 release. Rather
than ship the feature and then have backwards-compatibility
concerns constraining any redesign, let's remove it for now
and try again later.
Discussion: https://postgr.es/m/TY2PR01MB2443EC8286995378AEB7D9F8F5B10@TY2PR01MB2443.jpnprd01.prod.outlook.com
This addresses a couple of issues in the code:
- Typos and inconsistencies in comments and function declarations.
- Removal of unreferenced function declarations.
- Removal of unnecessary compile flags.
- A cleanup error in regressplans.sh.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
Besides implementing the new statement this change fix some issues with the
parsing of PREPARE and EXECUTE statements. The different forms of these
statements are now all handled in a ujnified way.
Author: Matsumura-san <matsumura.ryo@jp.fujitsu.com>
The Bison documentation clearly states that a semicolon is required
after every grammar rule, and our scripts that generate ecpg's
grammar from the backend's implicitly assumed this is true. But it
turns out that only ancient versions of Bison actually enforce that.
There have been a couple of rules without trailing semicolons in
gram.y for some time, and as a consequence, ecpg's grammar was faulty
and produced wrong output for the affected statements.
To fix, add the missing semis, and add some cross-checks to ecpg's
scripts so that they'll bleat if we mess this up again.
The cases that were broken were:
* "SET variable = DEFAULT" (but not "SET variable TO DEFAULT"),
as well as allied syntaxes such as ALTER SYSTEM SET ... DEFAULT.
These produced syntactically invalid output that the server
would reject.
* Multiple type names in DROP TYPE/DOMAIN commands. Only the
first type name would be listed in the emitted command.
Per report from Daisuke Higuchi. Back-patch to all supported versions.
Discussion: https://postgr.es/m/1803D792815FC24D871C00D17AE95905DB51CE@g01jpexmbkw24
So far ECPG programs had to treat binary data for bytea column as 'char' type.
But this meant converting from/to escaped format with PQunescapeBytea/
PQescapeBytea() and therefore forcing users to add unnecessary code and cost
for the conversion in runtime. By adding a dedicated datatype for bytea most of
this special handling is no longer needed.
Author: Matsumura-san ("Matsumura, Ryo" <matsumura.ryo@jp.fujitsu.com>)
Discussion: https://postgr.es/m/flat/03040DFF97E6E54E88D3BFEE5F5480F737A141F9@G01JPEXMBYT04
DECLARE STATEMENT is a statement that lets users declare an identifier
pointing at a connection. This identifier will be used in other embedded
dynamic SQL statement such as PREPARE, EXECUTE, DECLARE CURSOR and so on.
When connecting to a non-default connection, the AT clause can be used in
a DECLARE STATEMENT once and is no longer needed in every dynamic SQL
statement. This makes ECPG applications easier and more efficient. Moreover,
writing code without designating connection explicitly improves portability.
Authors: Ideriha-san ("Ideriha, Takeshi" <ideriha.takeshi@jp.fujitsu.com>)
Kuroda-san ("Kuroda, Hayato" <kuroda.hayato@jp.fujitsu.com>)
Discussion: https://postgr.es/m4E72940DA2BF16479384A86D54D0988A565669DF@G01JPEXMBKW04
The function called can result in an out of memory error that subsequently was
disregarded. Instead it should set the appropriate SQL error variables and be
checked by whatever whenever statement is defined.
Extends the COPY FROM command with a WHERE condition, which allows doing
various types of filtering while importing the data (random sampling,
condition on a data column, etc.). Until now such filtering required
either preprocessing of the input data, or importing all data and then
filtering in the database. COPY FROM ... WHERE is an easy-to-use and
low-overhead alternative for most simple cases.
Author: Surafel Temesgen
Reviewed-by: Tomas Vondra, Masahiko Sawada, Lim Myungkyu
Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com
Commit c0d0e54084 replaced the ones in the documentation, but missed out
on the ones in the code. Replace those as well, but unlike c0d0e54084,
don't backpatch the code changes to avoid breaking translations.
We've been speculating for a long time that hash-based keyword lookup
ought to be faster than binary search, but up to now we hadn't found
a suitable tool for generating the hash function. Joerg Sonnenberger
provided the inspiration, and sample code, to show us that rolling our
own generator wasn't a ridiculous idea. Hence, do that.
The method used here requires a lookup table of approximately 4 bytes
per keyword, but that's less than what we saved in the predecessor commit
afb0d0712, so it's not a big problem. The time savings is indeed
significant: preliminary testing suggests that the total time for raw
parsing (flex + bison phases) drops by ~20%.
Patch by me, but it owes its existence to Joerg Sonnenberger;
thanks also to John Naylor for review.
Discussion: https://postgr.es/m/20190103163340.GA15803@britannica.bec.de
Previously, ScanKeywordLookup was passed an array of string pointers.
This had some performance deficiencies: the strings themselves might
be scattered all over the place depending on the compiler (and some
quick checking shows that at least with gcc-on-Linux, they indeed
weren't reliably close together). That led to very cache-unfriendly
behavior as the binary search touched strings in many different pages.
Also, depending on the platform, the string pointers might need to
be adjusted at program start, so that they couldn't be simple constant
data. And the ScanKeyword struct had been designed with an eye to
32-bit machines originally; on 64-bit it requires 16 bytes per
keyword, making it even more cache-unfriendly.
Redesign so that the keyword strings themselves are allocated
consecutively (as part of one big char-string constant), thereby
eliminating the touch-lots-of-unrelated-pages syndrome. And get
rid of the ScanKeyword array in favor of three separate arrays:
uint16 offsets into the keyword array, uint16 token codes, and
uint8 keyword categories. That reduces the overhead per keyword
to 5 bytes instead of 16 (even less in programs that only need
one of the token codes and categories); moreover, the binary search
only touches the offsets array, further reducing its cache footprint.
This also lets us put the token codes somewhere else than the
keyword strings are, which avoids some unpleasant build dependencies.
While we're at it, wrap the data used by ScanKeywordLookup into
a struct that can be treated as an opaque type by most callers.
That doesn't change things much right now, but it will make it
less painful to switch to a hash-based lookup method, as is being
discussed in the mailing list thread.
Most of the change here is associated with adding a generator
script that can build the new data structure from the same
list-of-PG_KEYWORD header representation we used before.
The PG_KEYWORD lists that plpgsql and ecpg used to embed in
their scanner .c files have to be moved into headers, and the
Makefiles have to be taught to invoke the generator script.
This work is also necessary if we're to consider hash-based lookup,
since the generator script is what would be responsible for
constructing a hash table.
Aside from saving a few kilobytes in each program that includes
the keyword table, this seems to speed up raw parsing (flex+bison)
by a few percent. So it's worth doing even as it stands, though
we think we can gain even more with a follow-on patch to switch
to hash-based lookup.
John Naylor, with further hacking by me
Discussion: https://postgr.es/m/CAJVSVGXdFVU2sgym89XPL=Lv1zOS5=EHHQ8XWNzFL=mTXkKMLw@mail.gmail.com
It's important for link commands to list *.o input files before -l
switches for libraries, as library code may not get pulled into the link
unless referenced by an earlier command-line entry. This is certainly
necessary for static libraries (.a style). Apparently on some platforms
it is also necessary for shared libraries, as reported by Donald Dong.
We often put -l switches for within-tree libraries into LDFLAGS, meaning
that link commands that list *.o files after LDFLAGS are hazardous.
Most of our link commands got this right, but a few did not. In
particular, places that relied on gmake's default implicit link rule
failed, because that puts LDFLAGS first. Fix that by overriding the
built-in rule with our own. The implicit link rules in
src/makefiles/Makefile.* for single-.o-file shared libraries mostly
got this wrong too, so fix them. I also changed the link rules for the
backend and a couple of other places for consistency, even though they
are not (currently) at risk because they aren't adding any -l switches
to LDFLAGS.
Arguably, the real problem here is that we're abusing LDFLAGS by
putting -l switches in it and we should stop doing that. But changing
that would be quite invasive, so I'm not eager to do so.
Perhaps this is a candidate for back-patching, but so far it seems
that problems can only be exhibited in test code we don't normally
build, and at least some of the problems are new in HEAD anyway.
So I'll refrain for now.
Donald Dong and Tom Lane
Discussion: https://postgr.es/m/CAKABAquXn-BF-vBeRZxhzvPyfMqgGuc74p8BmQZyCFDpyROBJQ@mail.gmail.com
Re-making ecpglib's typename.o is dangerous because another make thread
could be doing that at the same time. While we've not heard field
complaints traceable to this, it seems inevitable that it'd bite someone
eventually. Instead, symlink typename.c into the preproc directory and
recompile it there. That file is small enough that compiling it twice
isn't much of a penalty. Furthermore, this way we get a .o file that's
made without shlib CFLAGS, which seems cleaner.
This requires adding more stuff to the module's -I list. The MSVC
aspect of that is untested, but I'm sure the buildfarm will tell me
if I got it wrong.
Per a suggestion from Peter Eisentraut. Although this is theoretically
a bug fix, the lack of field reports makes me feel we needn't back-patch.
Discussion: https://postgr.es/m/31364.1543511708@sss.pgh.pa.us
This should reduce confusion, and in particular make it safe to
copy typename.c into preproc/ and compile it there.
This doesn't affect anything outside ecpg, and particularly not
end users, because these files don't get installed; they just
exist to share declarations among the .c files of each subdirectory.
Discussion: https://postgr.es/m/31364.1543511708@sss.pgh.pa.us
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de