Sort(1)'s radixsort implementation was broken for multibyte LC_CTYPEs in at
least two ways:
* In actual radix sort, it would only bucket the least significant
byte from each wchar, ignoring the 24 most-significant bits of each
unicode character.
* In degenerate cases / "fast paths," it would fall back to another
sorting algorithm (default: mergesort) with a bogus comparator
offset. The string comparison functions in sort(1) take an offset
in units of the operating character size. However, radixsort was
passing an offset in units of bytes. The byte offset must be
divided by sizeof(wchar_t).
This revision addresses both discovered issues.
Some example testcases:
$ (echo 耳 ; echo 脳 ; echo 耳) | \
LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug
$ (echo 耳 ; echo 脳 ; echo 耳) | \
LC_CTYPE=C LC_COLLATE=C LANG=C sort --radixsort --debug
$ (for i in $(jot 34); do echo 耳耳耳耳耳; echo 耳耳耳耳脳; echo 耳耳耳耳脴; done) | \
LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug
PR: 247494
Reported by: knu
MFC after: I do not intend to, but parties interested in stable might want to
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
Worker threads now use a pthread_cond_t to wait for work instead of
burning the cpu up.
Obtained from: DragonflyBSD (07774aea0ccf64a48fcfad8899e3bf7c8f18277a)
MFC after: 2 weeks
zero-length array are dynamically sized at run-time based on the use
of hints, compilers can't be expected to figure out these offsets on
their own. [1]
- Fix incorrect comparison in cmp_nans(). [2]
PR: 204571 [1], 202301 [2]
Submitted by: David Binderman [2]
MFC after: 3 days
- Change default sort method to mergesort, which has a better worst case
performance than qsort
Submitted by: Oleg Moskalenko <oleg.moskalenko@citrix.com>
- Do not use mmap() by default; it can be enabled by --mmap
- Add some minor optimizations for -u
- Update manual page according to the changes
Submitted by: Oleg Moskalenko <oleg.moskalenko@citrix.com>
with the major functionality and optimizations by Oleg Moskalenko.
It is compatible with the latest version of POSIX and the current GNU sort
version that we have in base. Beside this, it implements all the
functionality introduced in later versions of GNU sort. For now, it will
be installed as "bsdsort", keeping GNU sort as the default sort
implementation.