Commit graph

130 commits

Author SHA1 Message Date
Baptiste Daroussin
b4572fe565 sort: deindent file_reader_free and cleanup its usage
(cherry picked from commit 226e41467e)
2022-10-19 09:59:31 +02:00
Baptiste Daroussin
1cdb98e725 sort: simplify file_reader_clean
Deindent the function, remove useless tests:
 - free already test if argument is NULL
 - closefile already test if the input is stdin or null

(cherry picked from commit ffd41d39c6)
2022-10-19 09:59:31 +02:00
Baptiste Daroussin
0821423134 sort: deindent closefile
(cherry picked from commit f9d9a7cc4f)
2022-10-19 09:59:30 +02:00
Baptiste Daroussin
67ca992f5c sort: use asprintf(3) instead of malloc + snprintf(3)
(cherry picked from commit 48a53cc484)
2022-10-19 09:59:30 +02:00
Baptiste Daroussin
e3231f459f sort: deindent openfile
(cherry picked from commit 958b0d4642)
2022-10-19 09:59:30 +02:00
Baptiste Daroussin
e891183e60 sort: use memset to initialize structure when possible
(cherry picked from commit f02c783757)
2022-10-19 09:59:29 +02:00
Baptiste Daroussin
2afc73005c sort: unify the code to read from FILE *
Previously the code to read from a local file or stdin was sperarated
After the change to remove the home made line reader used for stdin
(replaced by getdelim) it apprears that the rest of the code which is
used to read from any FILE * but stdin can benefit from the exact same
change.

(cherry picked from commit 8b9071360a)
2022-10-19 09:59:29 +02:00
Baptiste Daroussin
d068b2c1ea sort: remove unused function
(cherry picked from commit e8815fb30b)
2022-10-19 09:59:29 +02:00
Baptiste Daroussin
cacedfd1df sort: simplify the code to handle -z flag
(cherry picked from commit f079ef8aa4)
2022-10-19 09:59:29 +02:00
Baptiste Daroussin
713ef741b7 sort: cleanup now unused structutre and prototypes
(cherry picked from commit 4d4fcf619e)
2022-10-19 09:59:28 +02:00
Baptiste Daroussin
1286974328 sort: use mkstemp(3) instead of reinventing it
MFC After:	1 week

(cherry picked from commit 3f9e5e59bd)
2022-10-19 09:59:28 +02:00
Baptiste Daroussin
489c9df198 sort: replace home made line reader by getdelim(3)
The previous code had bug when reading lines with an unexpected
encoding, returning without the full line being captured.
This result in sort complaining with "sort: Illegal byte sequence"

Using getdelim(3) instead of the home made code, fixes the situation.

PR:		241679
Reported by:	Ronald F. Guilmette <rfg-freebsd@tristatelogic.com>
MFC After:	1 week
Reviewed by:	markj, imp
Differential Revision:	https://reviews.freebsd.org/D36948

(cherry picked from commit b58094c0d9)
2022-10-19 09:59:28 +02:00
Baptiste Daroussin
accbb97f87 sort: add wrapper around calloc
(cherry picked from commit a312f3e742)
2022-10-19 09:56:51 +02:00
Baptiste Daroussin
47204fd342 sort: replace malloc+memset with calloc
(cherry picked from commit ecc3c29167)
2022-10-19 09:56:50 +02:00
Doug Rabson
55f186a13a Move sort to runtime
Allows pkg bootstrap without having to install FreeBSD-utilities

(cherry picked from commit 0c19c4db74)
2022-08-19 14:27:16 +01:00
Mark Johnston
1a233fd317 sort: Fix message catalogue usage
- Check that catopen() succeeded before calling catclose().  musl will
  crash in the latter if the catalogue descriptor is -1.
- Keep the message catalogue open for most of sort(1)'s actual
  operation.
- Don't use catgets(3) to print error messages if catopen(3) had failed.

Reviewed by:	arichardson, emaste
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 8d8b9b560a)
2022-02-04 09:58:13 -05:00
Mark Johnston
946a297fbd sort: Fix random sort
bwsrawdata() is supposed to return the string buffer.

PR:		259451
Reported by:	sigsys@gmail.com
Fixes:		d053fb22f6 ("usr.bin/sort: Avoid UBSan errors")
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit e9bfb50d5e)
2021-11-01 09:10:54 -04:00
Alex Richardson
e4c2ffe932 usr.bin/sort: Avoid UBSan errors
UBSan complains about out-of-bounds accesses for zero-length arrays. To
avoid this we can use flexible array members. However, the C standard does
not allow for structures that only contain flexible array members, so we
move the length parameters into that structure too.

Split out from D28233.

Reviewed By:	markj
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D31009

(cherry picked from commit d053fb22f6)
2021-08-05 09:57:45 +01:00
Mark Johnston
b98b323813 sort: Hook NetBSD tests up to the build
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 186ba88a7c)
2021-05-20 09:15:49 -04:00
Cyril Zhang
df40dcbf7c sort: Cache value of MB_CUR_MAX
Every usage of MB_CUR_MAX results in a call to __mb_cur_max.  This is
inefficient and redundant.  Caching the value of MB_CUR_MAX in a global
variable removes these calls and speeds up the runtime of sort.  For
numeric sorting, runtime is almost halved in some tests.

PR:		255551
PR:		255840
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30170

(cherry picked from commit 71ec05a212)
2021-05-20 09:15:43 -04:00
Cyril Zhang
f80d1c0035 sort: Stop "fixing" obsolete key syntax after -- flag
PR:		255798
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30234

(cherry picked from commit fa43162c63)
2021-05-20 09:15:40 -04:00
Alex Richardson
266b51ac6e Fix -Wpointer-sign warnings in bwstring.c 2020-09-10 15:37:19 +00:00
Gordon Bergling
2d955e4199 sort(1): Remove duplicate option check
Reviewed by:	lwhsu, emaste
Approved by:	emaste
Obtained from:	DragonFlyBSD
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23892
2020-09-08 15:01:49 +00:00
Conrad Meyer
9f7e5bdad1 sort(1): Fix two wchar-related bugs in radixsort
Sort(1)'s radixsort implementation was broken for multibyte LC_CTYPEs in at
least two ways:

  * In actual radix sort, it would only bucket the least significant
    byte from each wchar, ignoring the 24 most-significant bits of each
    unicode character.

  * In degenerate cases / "fast paths," it would fall back to another
    sorting algorithm (default: mergesort) with a bogus comparator
    offset.  The string comparison functions in sort(1) take an offset
    in units of the operating character size.  However, radixsort was
    passing an offset in units of bytes.  The byte offset must be
    divided by sizeof(wchar_t).

This revision addresses both discovered issues.

Some example testcases:

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=C LC_COLLATE=C LANG=C           sort --radixsort --debug

  $ (for i in $(jot 34); do echo 耳耳耳耳耳; echo 耳耳耳耳脳; echo 耳耳耳耳脴; done) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

PR:		247494
Reported by:	knu
MFC after:	I do not intend to, but parties interested in stable might want to
2020-06-23 16:43:48 +00:00
Simon J. Gerraty
2c9a9dfc18 Update Makefile.depend files
Update a bunch of Makefile.depend files as
a result of adding Makefile.depend.options files

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22494
2019-12-11 17:37:53 +00:00
Simon J. Gerraty
5ab1c5846f Add Makefile.depend.options
Leaf directories that have dependencies impacted
by options need a Makefile.depend.options file
to avoid churn in Makefile.depend

DIRDEPS for cases such as OPENSSL, TCP_WRAPPERS etc
can be set in local.dirdeps-options.mk
which can add to those set in Makefile.depend.options

See share/mk/dirdeps-options.mk

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22469
2019-12-11 17:37:37 +00:00
Sevan Janiyan
08509077b3 Adjust history, info source from v1's manuals
https://www.bell-labs.com/usr/dmr/www/1stEdman.html

MFC after:	5 days
2019-09-04 13:44:46 +00:00
Conrad Meyer
f20b149b45 sort(1): Memoize MD5 computation to reduce repeated computation
Experimentally, reduces sort -R time of a 148160 line corpus from about
3.15s to about 0.93s on this particular system.

There's probably room for improvement using some digest other than md5, but
I don't want to look at sort(1) anymore.  Some discussion of other possible
improvements in the Test Plan section of the Differential.

PR:		230792
Reviewed by:	jhb (earlier version)
Differential Revision:	https://reviews.freebsd.org/D19885
2019-04-13 04:42:17 +00:00
Conrad Meyer
7a590a370a sort(1): Simplify and bound random seeding
Bound input file processing length to avoid the issue reported in [1].  For
simplicity, only allow regular file and character device inputs.  For
character devices, only allow /dev/random (and /dev/urandom symblink).

32 bytes of random is perfectly sufficient to seed MD5; we don't need any
more.  Users that want to use large files as seeds are encouraged to truncate
those files down to an appropriate input file via tools like sha256(1).

(This does not change the sort algorithm of sort -R.)

[1]: https://lists.freebsd.org/pipermail/freebsd-hackers/2018-August/053152.html

PR:		230792
Reported by:	Ali Abdallah <aliovx AT gmail.com>
Relnotes:	yes
2019-04-11 05:08:49 +00:00
Conrad Meyer
74504eefa1 sort(1): Whitespace and style cleanup
No functional change.

Sponsored by:	Dell EMC Isilon
2019-04-11 00:39:06 +00:00
Conrad Meyer
fff4eaebbf sort(1): randomcoll: Skip the memory allocation entirely
There's no reason to order based on strcmp of ASCII digests instead of
memcmp of the raw digests.

While here, remove collision fallback.  If you collide two MD5s, they're
probably the same string anyway.  If robustness against MD5 collisions is
desired, maybe we shouldn't use MD5.

None of the behavior of sort -R is specified by POSIX, so we're free to
implement this however we like.  E.g., using a 128-bit counter and block cipher
to generate unique indices for each line of input.

PR:		230792 (2/many)
Relnotes:	This will change the sort order for a given dataset with a
		given seed.  Other similarly breaking changes are planned.
Sponsored by:	Dell EMC Isilon
2019-04-04 23:32:27 +00:00
Conrad Meyer
e667e2a480 sort(1): randomcoll: Don't sort on ENOMEM
PR:		230792 (1/many)
Sponsored by:	Dell EMC Isilon
2019-04-04 20:27:13 +00:00
Alex Richardson
101db63b42 Don't use absolute path to sed when building usr.bin/join
This is required to build sort on Linux hosts since sed is in /bin there.

Approved By:	jhb (mentor)
2018-08-23 18:18:43 +00:00
Kyle Evans
7137597e15 sort(1): Fix -m when only implicit stdin is used for input
Observe:

printf "a\nb\nc\n" > /tmp/foo
# Next command results in no output
cat /tmp/foo | sort -m
# Next command results in proper output
cat /tmp/foo | sort -m -
# Also works:
sort -m /tmp/foo

Some const'ification was done to simplify the actual solution of adding "-"
explicitly to the file list if we didn't have any file arguments left over.

PR:		190099
MFC after:	1 week
2018-06-20 03:31:19 +00:00
Kyle Evans
36180cd53d sort(1): Add bits to allow easy checking against NetBSD tests
I'm looking at sort(1) failures, for better or worse.
2018-06-20 03:10:49 +00:00
Mark Johnston
7f180c0f80 Fix the WITH_SORT_THREADS build.
PR:		201664
MFC after:	1 week
2018-02-07 20:36:37 +00:00
Pedro F. Giffuni
1de7b4b805 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
Bryan Drewery
ea825d0274 DIRDEPS_BUILD: Update dependencies.
Sponsored by:	Dell EMC Isilon
2017-10-31 00:07:04 +00:00
Pedro F. Giffuni
759a9a9d24 sort(1): Remove unneeded initializations.
Found by:	Clang static analyzer
2017-02-17 19:53:20 +00:00
Pedro F. Giffuni
692cd1a3b2 sort - Don't live-loop threads.
Worker threads now use a pthread_cond_t to wait for work instead of
burning the cpu up.

Obtained from:	DragonflyBSD (07774aea0ccf64a48fcfad8899e3bf7c8f18277a)
MFC after:	2 weeks
2017-01-23 15:39:51 +00:00
Marius Strobl
ed7aec1e45 - Use correct offsets into the keys set array. As the elements of this
zero-length array are dynamically sized at run-time based on the use
  of hints, compilers can't be expected to figure out these offsets on
  their own. [1]
- Fix incorrect comparison in cmp_nans(). [2]

PR:		204571 [1], 202301 [2]
Submitted by:	David Binderman [2]
MFC after:	3 days
2016-12-28 17:13:03 +00:00
Xin LI
3611de44ef pages and psize are always assigned, so there is no need to initialize
them as zero.

MFC after:	2 weeks
2016-11-28 06:38:41 +00:00
Xin LI
c514c3ed4f Eliminate variables that are computed, assigned but never
used.

MFC after:	2 weeks
2016-11-28 06:36:10 +00:00
Xin LI
665d2db378 Fix an obvious typo.
MFC after:	2 weeks
2016-11-28 06:32:05 +00:00
Gabor Kovesdan
a6be469014 - Fix typo
PR:		211245
Submitted by:	Christoph Schonweiler <public2016@hauptsignal.at>
MFC after:	5 days
2016-09-08 14:50:23 +00:00
Pedro F. Giffuni
80c7cc1c8f Cleanup unnecessary semicolons from utilities we all love. 2016-04-15 22:31:22 +00:00
Baptiste Daroussin
902b9f79f7 Fix some mdoc(7) issues
Obtained from:	DragonflyBSD
2015-10-24 13:43:10 +00:00
Gabor Kovesdan
a7bc18929d -C and -c allow at most one input file. Ensure this is the case when the
input files are specified through --files0-from.

Submitted by:	tim@OpenBSD
Obtained from:	OpenBSD
MFC after:	1 week
2015-10-22 10:57:15 +00:00
Simon J. Gerraty
2ef6d5a7b9 new depends 2015-06-16 23:37:19 +00:00
Simon J. Gerraty
ccfb965433 Add META_MODE support.
Off by default, build behaves normally.
WITH_META_MODE we get auto objdir creation, the ability to
start build from anywhere in the tree.

Still need to add real targets under targets/ to build packages.

Differential Revision:       D2796
Reviewed by: brooks imp
2015-06-13 19:20:56 +00:00