In passing, throw an error if the AF count is too small, rather than
just silently discarding extra affix entries.
Note that the new regression test cases require installing the
updated src/backend/tsearch/dicts files.
Arthur Zakirov
Discussion: https://postgr.es/m/20180413113447.GA32474@zakirov.localdomain
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily. Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser. Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.
This does create a user-visible behavioral change, which is that while
formerly all of these would work:
alter table foo set (fillfactor = 50);
alter table foo set (FillFactor = 50);
alter table foo set ("fillfactor" = 50);
alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others. However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.
Daniel Gustafsson, reviewed by Michael Paquier, small changes by me
Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
readable
Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
Now it's possible to load recent version of Hunspell for several languages.
To handle these dictionaries Hunspell patch adds support for:
* FLAG long - sets the double extended ASCII character flag type
* FLAG num - sets the decimal number flag type (from 1 to 65535)
* AF parameter - alias for flag's set
Also it moves test dictionaries into separate directory.
Author: Artur Zakirov with editorization by me
categories, as per discussion. asciiword (formerly lword) is still
ASCII-letters-only, and numword (formerly word) is still the most general
mixed-alpha-and-digits case. But word (formerly nlword) is now
any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as
before. This is no worse than before for parsing mixed Russian/English text,
which seems to have been the design center for the original coding; and it
should simplify matters for parsing most European languages. In particular
it will not be necessary for any language to accept strings containing digits
as being regular "words". The hyphenated-word categories are adjusted
similarly.
Rename synonym.syn.sample and thesaurs.ths.sample to
synonym_sample.syn and thesaurs_sample.ths accordingly to be able to use they
in regression test.
Ispell dictionary uses synthetic simple dictionary files.