doc: Clarify collation requirements for base32hex sortability.

While fixing the base32hex UUID sortability test in commit 89210037a0, it turned out that the expected lexicographical order is only maintained under the C collation (or an equivalent byte-wise collation). Natural language collations may employ different rules, breaking the sortability. This commit updates the documentation to explicitly state that base32hex is "byte-wise sortable", ensuring users do not fall into the trap of using natural language collations when querying their encoded data. Co-Authored-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com
2026-05-16 11:29:49 -04:00 · 2026-03-27 12:13:29 -07:00 · 2026-03-27 12:13:29 -07:00 · e752a2ccc9
commit e752a2ccc9
parent d7965d65fc
1 changed files with 11 additions and 3 deletions
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@ -778,18 +778,26 @@
       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
       (<literal>0</literal>-<literal>9</literal> and
-       <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical
-       sort order of the encoded data. The <function>encode</function> function
+       <literal>A</literal>-<literal>V</literal>) which preserves the sort order of
+       the encoded data when compared byte-wise. The <function>encode</function> function
       produces output padded with <literal>'='</literal>, while <function>decode</function>
       accepts both padded and unpadded input. Decoding is case-insensitive and ignores
       whitespace characters.
      </para>
      <para>
-       This format is useful for encoding UUIDs in a compact, sortable format:
+       This format is useful for encoding UUIDs in a compact, byte-wise sortable format:
       <literal>rtrim(encode(uuid_value::bytea, 'base32hex'), '=')</literal>
       produces a 26-character string compared to the standard 36-character
       UUID representation.
      </para>
+      <note>
+       <para>
+        To maintain the lexicographical sort order of the encoded data,
+        ensure that the text is sorted using the C collation
+        (e.g., using <literal>COLLATE "C"</literal>). Natural language
+        collations may sort characters differently and break the ordering.
+       </para>
+      </note>
     </listitem>
    </varlistentry>