postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-29 18:32:53 -04:00

Author	SHA1	Message	Date
Tom Lane	7204f35919	Revert commit `66c0185a3` and follow-on patches. This reverts `66c0185a3` (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits `d5d2205c8`, `3b1a7eb28`, `7487044d6`. In addition to those, `07746a8ef` had to be removed then re-applied in a different place, because `66c0185a3` moved the relevant code. The reason for this last-minute thrashing is that depesz found a case in which the patched code creates a completely wrong plan that silently gives incorrect query results. It's unclear what the cause is or how many cases are affected, but with beta1 wrap staring us in the face, there's no time for closer investigation. After we figure that out, we can decide whether to un-revert this for beta2 or hold it for v18. Discussion: https://postgr.es/m/Zktzf926vslR35Fv@depesz.com (also some private discussion among pgsql-release)	2024-05-20 15:08:30 -04:00
David Rowley	66c0185a3d	Allow planner to use Merge Append to efficiently implement UNION Until now, UNION queries have often been suboptimal as the planner has only ever considered using an Append node and making the results unique by either using a Hash Aggregate, or by Sorting the entire Append result and running it through the Unique operator. Both of these methods always require reading all rows from the union subqueries. Here we adjust the union planner so that it can request that each subquery produce results in target list order so that these can be Merge Appended together and made unique with a Unique node. This can improve performance significantly as the union child can make use of the likes of btree indexes and/or Merge Joins to provide the top-level UNION with presorted input. This is especially good if the top-level UNION contains a LIMIT node that limits the output rows to a small subset of the unioned rows as cheap startup plans can be used. Author: David Rowley Reviewed-by: Richard Guo, Andy Fan Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com	2024-03-25 14:31:14 +13:00
Jeff Davis	f696c0cd5f	Catalog changes preparing for builtin collation provider. Rename pg_collation.colliculocale to colllocale, and pg_database.daticulocale to datlocale. These names reflects that the fields will be useful for the upcoming builtin provider as well, not just for ICU. This is purely a rename; no changes to the meaning of the fields. Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Peter Eisentraut	2024-03-09 14:48:18 -08:00
Jeff Davis	f3a01af29b	ICU: do not convert locale 'C' to 'en-US-u-va-posix'. Older versions of ICU canonicalize "C" to "en-US-u-va-posix"; but starting in ICU version 64, the "C" locale is considered obsolete. Postgres commit `ea1db8ae70` introduced code to always canonicalize "C" to "en-US-u-va-posix" for consistency and convenience, but it was deemed too confusing. This commit removes that code, so that "C" is treated like other ICU locale names: canonicalization is attempted, and if it fails, the behavior is controlled by icu_validation_level. A similar change was previously committed as `f7faa9976c`, then reverted due to an ICU-version-dependent test failure. This commit un-reverts it, omitting the test because we now expect the behavior to depend on the version of ICU being used. Discussion: https://postgr.es/m/3a200aca-4672-4b37-fc91-5d198a323503%40eisentraut.org Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com Discussion: https://postgr.es/m/37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com	2023-06-21 13:18:25 -07:00
Jeff Davis	2535c74b1a	initdb: change default --locale-provider back to libc. Reverts `27b62377b4`. Discussion: https://postgr.es/m/eff031036baa07f325de29215371a4c9e69d61f3.camel@j-davis.com Discussion: https://postgr.es/m/3353947.1682092131@sss.pgh.pa.us	2023-06-21 11:10:03 -07:00
Peter Eisentraut	b0f6c43716	Remove read-only server settings lc_collate and lc_ctype The GUC settings lc_collate and lc_ctype are from a time when those locale settings were cluster-global. When those locale settings were made per-database (PG 8.4), the settings were kept as read-only. As of PG 15, you can use ICU as the per-database locale provider, so examining these settings is already less meaningful and possibly confusing, since you need to look into pg_database to find out what is really happening, and they would likely become fully obsolete in the future anyway. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/696054d1-bc88-b6ab-129a-18b8bce6a6f0@enterprisedb.com	2023-06-07 16:57:06 +02:00
Jeff Davis	6de31ce446	Reduce icu_validation_level default to WARNING. Discussion: https://postgr.es/m/daa9f060aa2349ebc84444515efece49e7b32c5d.camel@j-davis.com	2023-05-17 13:18:40 -07:00
Jeff Davis	455f948b0d	Revert "ICU: do not convert locale 'C' to 'en-US-u-va-posix'." This reverts commit `f7faa9976c`. Discussion: https://postgr.es/m/483826.1683582475@sss.pgh.pa.us	2023-05-08 20:50:51 -07:00
Jeff Davis	f7faa9976c	ICU: do not convert locale 'C' to 'en-US-u-va-posix'. The conversion was intended to be for convenience, but it's more likely to be confusing than useful. The user can still directly specify 'en-US-u-va-posix' if desired. Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com Discussion: https://postgr.es/m/37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com	2023-05-08 10:34:51 -07:00
Jeff Davis	ea1db8ae70	Canonicalize ICU locale names to language tags. Convert to BCP47 language tags before storing in the catalog, except during binary upgrade or when the locale comes from an existing collation or template database. The resulting language tags can vary slightly between ICU versions. For instance, "@colBackwards=yes" is converted to "und-u-kb-true" in older versions of ICU, and to the simpler (but equivalent) "und-u-kb" in newer versions. The process of canonicalizing to a language tag also understands more input locale string formats than ucol_open(). For instance, "fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is ignored; effectively treating it the same as the locale "fr" and opening the wrong collator. Canonicalization properly interprets the language and region, resulting in the language tag "fr-CA", which can then be understood by ucol_open(). This commit fixes a problem in prior versions due to ucol_open() misinterpreting locale strings as described above. For instance, creating an ICU collation with locale "fr_CA.UTF-8" would store that string directly in the catalog, which would later be passed to (and misinterpreted by) ucol_open(). After this commit, the locale string will be canonicalized to language tag "fr-CA" in the catalog, which will be properly understood by ucol_open(). Because this fix affects the resulting collator, we cannot change the locale string stored in the catalog for existing databases or collations; otherwise we'd risk corrupting indexes. Therefore, only canonicalize locales for newly-created (not upgraded) collations/databases. For similar reasons, do not backport. Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-04-04 10:38:58 -07:00
Jeff Davis	1671f990dd	Validate ICU locales. For ICU collations, ensure that the locale's language exists in ICU, and that the locale can be opened. Basic validation helps avoid minor mistakes and misspellings, which often fall back to the root locale instead of the intended locale. It's even more important to avoid such mistakes in ICU versions 54 and earlier, where the same (misspelled) locale string could fall back to different locales depending on the environment. Discussion: https://postgr.es/m/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.camel@j-davis.com Discussion: https://postgr.es/m/df2efad0cae7c65180df8e5ebb709e5eb4f2a82b.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-03-28 16:34:29 -07:00
Jeff Davis	3b50275b12	Handle the "und" locale in ICU versions 54 and older. The "und" locale is an alternative spelling of the root locale, but it was not recognized until ICU 55. To maintain common behavior across all supported ICU versions, check for "und" and replace with "root" before opening. Previously, the lack of support for "und" was dangerous, because versions 54 and older fall back to the environment when a locale is not found. If the user specified "und" for the language (which is expected and documented), it could not only resolve to the wrong collator, but it could unexpectedly change (which could lead to corrupt indexes). This effectively reverts commit `d72900bded`, which worked around the problem for the built-in "unicode" collation, and is no longer necessary. Discussion: https://postgr.es/m/60da0cecfb512a78b8666b31631a636215d8ce73.camel@j-davis.com Discussion: https://postgr.es/m/0c6fa66f2753217d2a40480a96bd2ccf023536a1.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-03-23 10:08:27 -07:00
Jeff Davis	869650fa86	Support language tags in older ICU versions (53 and earlier). By calling uloc_canonicalize() before parsing the attributes, the existing locale attribute parsing logic works on language tags as well. Fix a small memory leak, too. Discussion: http://postgr.es/m/60da0cecfb512a78b8666b31631a636215d8ce73.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-03-21 16:12:37 -07:00
Peter Eisentraut	3e623ebc7a	Fix tests for non-ICU build missed in `0d21d4b9bc`	2023-03-10 14:27:55 +01:00
Peter Eisentraut	0d21d4b9bc	Add standard collation UNICODE This adds a new predefined collation named UNICODE, which sorts by the default Unicode collation algorithm specifications, per SQL standard. This only works if ICU support is built. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/flat/1293e382-2093-a2bf-a397-c04e8f83d3c2@enterprisedb.com	2023-03-10 13:35:43 +01:00
Peter Eisentraut	012ee84259	Add a test for UCS_BASIC collation	2023-03-10 11:18:08 +01:00
Peter Eisentraut	30a53b7929	Allow tailoring of ICU locales with custom rules This exposes the ICU facility to add custom collation rules to a standard collation. New options are added to CREATE COLLATION, CREATE DATABASE, createdb, and initdb to set the rules. Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com	2023-03-08 16:56:37 +01:00
Peter Eisentraut	f2553d4306	Add option to use ICU as global locale provider This adds the option to use ICU as the default locale provider for either the whole cluster or a database. New options for initdb, createdb, and CREATE DATABASE are used to select this. Since some (legacy) code still uses the libc locale facilities directly, we still need to set the libc global locale settings even if ICU is otherwise selected. So pg_database now has three locale-related fields: the existing datcollate and datctype, which are always set, and a new daticulocale, which is only set if ICU is selected. A similar change is made in pg_collation for consistency, but in that case, only the libc-related fields or the ICU-related field is set, never both. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com	2022-03-17 11:13:16 +01:00
Peter Eisentraut	37851a8b83	Database-level collation version tracking This adds to database objects the same version tracking that collation objects have. There is a new pg_database column datcollversion that stores the version, a new function pg_database_collation_actual_version() to get the version from the operating system, and a new subcommand ALTER DATABASE ... REFRESH COLLATION VERSION. This was not originally added together with pg_collation.collversion, since originally version tracking was only supported for ICU, and ICU on a database-level is not currently supported. But we now have version tracking for glibc (since PG13), FreeBSD (since PG14), and Windows (since PG13), so this is useful to have now. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.com	2022-02-14 08:27:26 +01:00
Thomas Munro	ec48314708	Revert per-index collation version tracking feature. Design problems were discovered in the handling of composite types and record types that would cause some relevant versions not to be recorded. Misgivings were also expressed about the use of the pg_depend catalog for this purpose. We're out of time for this release so we'll revert and try again. Commits reverted: `1bf946bd`: Doc: Document known problem with Windows collation versions. `cf002008`: Remove no-longer-relevant test case. `ef387bed`: Fix bogus collation-version-recording logic. `0fb0a050`: Hide internal error for pg_collation_actual_version(<bad OID>). `ff942057`: Suppress "warning: variable 'collcollate' set but not used". `d50e3b1f`: Fix assertion in collation version lookup. `f24b1569`: Rethink extraction of collation dependencies. `257836a7`: Track collation versions for indexes. `cd6f479e`: Add pg_depend.refobjversion. `7d1297df`: Remove pg_collation.collversion. Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com	2021-05-07 21:10:11 +12:00
Tom Lane	f24b156997	Rethink extraction of collation dependencies. As it stands, find_expr_references_walker() pays attention to leaf-node collation fields while ignoring the input collations of actual function and operator nodes. That seems exactly backwards from a semantic standpoint, and it leads to reporting dependencies on collations that really have nothing to do with the expression's behavior. Hence, rewrite to look at function input collations instead. This isn't completely perfect either; it fails to account for the behavior of record_eq and its siblings. (The previous coding at least gave an approximation of that, though I think it could be fooled pretty easily into considering the columns of irrelevant composite types.) We may be able to improve on this later, but for now this should satisfy the buildfarm members that didn't like `ef387bed8`. In passing fix some oversights in GetTypeCollations(), and get rid of its duplicative de-duplications. (I'm worried that it's still potentially O(N^2) or worse, but this makes it a little better.) Discussion: https://postgr.es/m/3564817.1618420687@sss.pgh.pa.us	2021-04-16 22:23:46 -04:00
Tom Lane	cf0020080a	Remove no-longer-relevant test case. collate.icu.utf8.sql was exercising the recording of a collation dependency for an enum comparison expression, but such an expression should never have had any collation dependency in the first place. After I fixed that in commit `c402b02b9`, the test started failing. We don't need to test that scenario anymore, so just remove the now-useless test steps. (This test case is new in HEAD, so no need to back-patch.) Discussion: https://postgr.es/m/3044030.1618261159@sss.pgh.pa.us Discussion: https://postgr.es/m/HK0PR01MB22744393C474D503E16C8509F4709@HK0PR01MB2274.apcprd01.prod.exchangelabs.com	2021-04-12 18:58:20 -04:00
Thomas Munro	8556267b2b	Revert "pg_collation_actual_version() -> pg_collation_current_version()." This reverts commit `9cf184cc05`. Name change less well received than anticipated. Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org	2021-02-26 15:29:27 +13:00
Thomas Munro	9cf184cc05	pg_collation_actual_version() -> pg_collation_current_version(). The new name seems a bit more natural. Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com	2021-02-22 23:32:16 +13:00
Thomas Munro	0fb0a0503b	Hide internal error for pg_collation_actual_version(<bad OID>). Instead of an unsightly internal "cache lookup failed" message, just return NULL for bad OIDs, as is the convention for other similar things. Reported-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com	2021-02-22 23:01:20 +13:00
Peter Eisentraut	560564d3ad	Enable hash partitioning of text arrays hash_array_extended() needs to pass PG_GET_COLLATION() to the hash function of the element type. Otherwise, the hash function of a collation-aware data type such as text will error out, since the introduction of nondeterministic collation made hash functions require a collation, too. The consequence of this is that before this change, hash partitioning using an array over text in the partition key would not work. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/32c1fdae-95c6-5dc6-058a-a90330a3b621%40enterprisedb.com	2020-11-04 12:46:28 +01:00
Thomas Munro	257836a755	Track collation versions for indexes. Record the current version of dependent collations in pg_depend when creating or rebuilding an index. When accessing the index later, warn that the index may be corrupted if the current version doesn't match. Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe, Michael Paquier, Robert Haas, Tom Lane and others for very helpful discussion. Author: Thomas Munro <thomas.munro@gmail.com> Author: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions) Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com	2020-11-03 01:19:50 +13:00
Thomas Munro	7d1297df08	Remove pg_collation.collversion. This model couldn't be extended to cover the default collation, and didn't have any information about the affected database objects when the version changed. Remove, in preparation for a follow-up commit that will add a new mechanism. Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com	2020-11-03 00:44:59 +13:00
Michael Paquier	96e7e1bc08	Fix random regression failure in test case "collate.icu.utf8" This is a fix similar to `2d7d67cc`, where slight plan alteration can cause a random failure of this regression test because of an incorect tuple ordering, except that this one involves lookups of pg_type. Similarly to the other case, add ORDER BY clauses to ensure the output order. The failure has been seen at least once on buildfarm member skink. Reported-by: Thomas Munro Discussion: https://postgr.es/m/CA+hUKGLjR9ZBvhXcr9b-NSBHPw9aRgbjyzGE+kqLsT4vwX+nkQ@mail.gmail.com Backpatch-through: 12	2019-08-14 13:37:48 +09:00
Tom Lane	03c811a483	Fix planner's test for case-foldable characters in ILIKE with ICU. As coded, the ICU-collation path in pattern_char_isalpha() failed to consider regular ASCII letters to be case-varying. This led to like_fixed_prefix treating too much of an ILIKE pattern as being a fixed prefix, so that indexscans derived from an ILIKE clause might miss entries that they should find. Per bug #15892 from James Inform. This is an oversight in the original ICU patch (commit `eccfef81e`), so back-patch to v10 where that came in. Discussion: https://postgr.es/m/15892-e5d2bea3e8a04a1b@postgresql.org	2019-08-12 13:15:47 -04:00
Peter Eisentraut	f140007050	Run UTF8-requiring collation tests by default The tests collate.icu.utf8 and collate.linux.utf8 were previously only run when explicitly selected via EXTRA_TESTS. They require a UTF8 database, because the error messages in the expected files refer to that, and they use some non-ASCII characters in the tests. Since users can select any locale and encoding for the regression test run, it was not possible to include these tests automatically. To fix, use psql's \if facility to check various prerequisites such as platform and the server encoding and quit the tests at the very beginning if the configuration is not adequate. We then need to maintain alternative expected files for these tests, but they are very tiny and never need to change after this. These two tests are now run automatically as part of the regression tests. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/052295c2-a2e1-9a21-bd36-8fbff8686cf3%402ndquadrant.com	2019-07-31 09:46:51 +02:00
Tom Lane	8994cc6ffc	Add ORDER BY to more ICU regression test cases. Commit `c77e12208` didn't fully fix the problem. Per buildfarm and local testing.	2019-03-26 17:46:04 -04:00
Peter Eisentraut	c77e12208c	Add ORDER BY to regression test case Apparently, the output order is different on different endianness, per build farm member snapper.	2019-03-25 08:15:38 +01:00
Peter Eisentraut	638db07814	Fix ICU tests for older ICU versions Change the tests to use old-style ICU locale specifications so that they can run on older ICU versions.	2019-03-22 14:40:56 +01:00
Peter Eisentraut	5e1963fb76	Collations with nondeterministic comparison This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com	2019-03-22 12:12:43 +01:00
Peter Eisentraut	1f050c08f9	Fix bug in support for collation attributes on older ICU versions Unrecognized attribute names are supposed to be ignored. But the code would error out on an unrecognized attribute value even if it did not recognize the attribute name. So unrecognized attributes wouldn't really be ignored unless the value happened to be one that matched a recognized value. This would break some important cases where the attribute would be processed by ucol_open() directly. Fix that and add a test case. The restructured code should also avoid compiler warnings about initializing a UColAttribute value to -1, because the type might be an unsigned enum. (reported by Andres Freund)	2019-03-19 09:37:46 +01:00
Peter Eisentraut	b8f9a2a69a	Add support for collation attributes on older ICU versions Starting in ICU 54, collation customization attributes can be specified in the locale string, for example "@colStrength=primary;colCaseLevel=yes". Add support for this for older ICU versions as well, by adding some minimal parsing of the attributes in the locale string and calling ucol_setAttribute() on them. This is essentially what never ICU versions do internally in ucol_open(). This was we can offer this functionality in a consistent way in all ICU versions supported by PostgreSQL. Also add some tests for ICU collation customization. Reported-by: Daniel Verite <daniel@manitou-mail.org> Discussion: https://www.postgresql.org/message-id/0270ebd4-f67c-8774-1a5a-91adfb9bb41f@2ndquadrant.com	2019-03-17 08:47:15 +01:00
Peter Eisentraut	727921f466	Hide cascade messages in collate tests These are not relevant to the tests and would just uselessly bloat patches.	2019-02-06 22:17:57 +01:00
Peter Eisentraut	e148009740	Remove ICU tests from default run These tests require the test database to be in UTF8 encoding. Until there is a better solution, take them out of the default test set and treat them like the existing collate.linux.utf8 test, meaning it has to be selected manually.	2017-03-25 00:30:26 -04:00

39 commits