From e752a2ccc98f324e7013cc4eabc1998d5a1020d0 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada Date: Fri, 27 Mar 2026 12:13:29 -0700 Subject: [PATCH] doc: Clarify collation requirements for base32hex sortability. While fixing the base32hex UUID sortability test in commit 89210037a0a, it turned out that the expected lexicographical order is only maintained under the C collation (or an equivalent byte-wise collation). Natural language collations may employ different rules, breaking the sortability. This commit updates the documentation to explicitly state that base32hex is "byte-wise sortable", ensuring users do not fall into the trap of using natural language collations when querying their encoded data. Co-Authored-by: Andrey Borodin Discussion: https://postgr.es/m/CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com --- doc/src/sgml/func/func-binarystring.sgml | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml index 0aaf9bc68f1..dc6b7e57ea7 100644 --- a/doc/src/sgml/func/func-binarystring.sgml +++ b/doc/src/sgml/func/func-binarystring.sgml @@ -778,18 +778,26 @@ RFC 4648 Section 7. It uses the extended hex alphabet (0-9 and - A-V) which preserves the lexicographical - sort order of the encoded data. The encode function + A-V) which preserves the sort order of + the encoded data when compared byte-wise. The encode function produces output padded with '=', while decode accepts both padded and unpadded input. Decoding is case-insensitive and ignores whitespace characters. - This format is useful for encoding UUIDs in a compact, sortable format: + This format is useful for encoding UUIDs in a compact, byte-wise sortable format: rtrim(encode(uuid_value::bytea, 'base32hex'), '=') produces a 26-character string compared to the standard 36-character UUID representation. + + + To maintain the lexicographical sort order of the encoded data, + ensure that the text is sorted using the C collation + (e.g., using COLLATE "C"). Natural language + collations may sort characters differently and break the ordering. + +