Commit graph

22 commits

Author SHA1 Message Date
Robert Clausecker
9a6a587e67 lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
Conceptually very similar to timingsafe_bcmp(), but with comparison
logic inspired by Elijah Stone's fancy memcmp. A baseline (SSE)
implementation was omitted this time as I was not able to get it to
perform adequately.  Best I got was 8% over the scalar version for
long inputs, but slower for short inputs.

Sponsored by:	The FreeBSD Foundation
Approved by:	security (cperciva)
Inspired by:	https://github.com/moon-chilled/fancy-memcmp
Differential Revision:	https://reviews.freebsd.org/D41696

(cherry picked from commit 5048c1b85506c5e0f441ee7dd98dd8d96d0a4a24)
2023-12-28 18:02:41 +01:00
Robert Clausecker
1347ec5d58 lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations
Very straightforward and similar to memcmp(3). The code has
been written to use only instructions specified as having
data operand independent timing by Intel.

Sponsored by:	The FreeBSD Foundation
Approved by:	security (cperciva)
Differential Revision:	https://reviews.freebsd.org/D41673

(cherry picked from commit 76c2b331bcd9f73c5c8c43a06e328fa0c7b8c39a)
2023-12-28 18:02:41 +01:00
Robert Clausecker
62f73a711e lib/libc/amd64/string: implement strnlen(3) trough memchr(3)
Now that we have an optimised memchr(3), we can use it to implement
strnlen(3) with better perofrmance.

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
MFC after:	1 week
MFC to:		stable/14
Differential Revision:	https://reviews.freebsd.org/D41598

(cherry picked from commit 331737281c1929c29e679e48783055351ac4fbd9)
2023-09-23 14:21:37 -04:00
Robert Clausecker
3f78bde932 lib/libc/amd64/string: add memchr(3) scalar, baseline implementation
This is conceptually similar to strchr(3), but there are
slight changes to account for the buffer having an explicit
buffer length.

this includes the bug fix from b2618b6.

Sponsored by:	The FreeBSD Foundation
Reported by:	yuri, des
Tested by:	des
Approved by:	mjg
MFC after:	1 week
MFC to:		stable/14
PR:		273652
Differential Revision:	https://reviews.freebsd.org/D41598

(cherry picked from commit de12a689fad271f5a2ba7c188b0b5fb5cabf48e7)
(cherry picked from commit b2618b651b28fd29e62a4e285f5be09ea30a85d4)
2023-09-23 14:20:28 -04:00
Robert Clausecker
39d500190b lib/libc/amd64/string: add strspn(3) scalar, x86-64-v2 implementation
This is conceptually very similar to the strcspn(3) implementations
from D41557, but we can't do the fast paths the same way.

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
MFC after:	1 week
MFC to:		stable/14
Differential Revision:	https://reviews.freebsd.org/D41567

(cherry picked from commit 7084133cde6a58412d86bae9f8a55b86141fb304)
2023-09-23 14:20:28 -04:00
Robert Clausecker
feda2297b7 lib/libc/amd64/string: add strcspn(3) scalar, x86-64-v2 implementation
This changeset adds both a scalar and an x86-64-v2 implementation
of the strcspn(3) function to libc. A baseline implementation does not
appear to be feasible given the requirements of the function.

The scalar implementation is similar to the generic libc implementation,
but expands the bit set into a byte set to reduce latency, improving
performance. This approach could probably be backported to the generic
C version to benefit other platforms.

The x86-64-v2 implementation is built around the infamous pcmpistri
instruction. An alternative implementation based on the Muła/Langdale
algorithm [1] was prototyped, but performed worse than the pcmpistri
approach except for sets of more than 16 characters with long input
strings.

All implementations provide special cases for the empty set (reduces to
strlen as well as single-character sets (reduces to strchr). The
x86-64-v2 kernel falls back to the scalar implementation for sets of
more than 32 characters. This limit could be raised by additional
multiples of 16 through the use of additional pcmpistri code paths, but
I consider this case to be too rare to be of importance.

This includes the bug fix from 52d4a4d.

[1]: http://0x80.pl/articles/simd-byte-lookup.html

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
MFC after:	1 week
MFC to:		stable/14
Differential Revision:	https://reviews.freebsd.org/D41557

(cherry picked from commit 474408bb7933f0383a0da2b01e717bfe683ae77c)
(cherry picked from commit 52d4a4d4e0dedc72bc33082a3f84c2d0fd6f2cbb)
2023-09-23 14:19:28 -04:00
Robert Clausecker
9fbea87028 lib/libc/amd64/string/stpcpy.S: add baseline implementation
This commit adds a baseline implementation of stpcpy(3) for amd64.
It performs quite well in comparison to the previous scalar implementation
as well as agains bionic and glibc (though glibc is faster for very long
strings).  Fiddle with the Makefile to also have strcpy(3) call into the
optimised stpcpy(3) code, fixing an oversight from D9841.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	imp ngie emaste
Approved by:	mjg kib
Fixes:		D9841
Differential Revision:	https://reviews.freebsd.org/D41349
2023-08-21 20:59:38 +02:00
Warner Losh
d0b2dbfa0e Remove $FreeBSD$: one-line sh pattern
Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/
2023-08-16 11:55:03 -06:00
Robert Clausecker
61f4c4d3dd lib/libc/amd64/string: add strchrnul implementations (scalar, baseline)
A lot better than the generic (pre) implementaion.  We do not beat glibc
for long strings, likely due to glibc switching to AVX once the input is
sufficiently long.  X86-64-v3 and v4 implementations may be added at a
future time.

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ strchrnul_pre.out │         strchrnul_scalar.out         │       strchrnul_baseline.out        │
        │      sec/op       │    sec/op     vs base                │   sec/op     vs base                │
Short          129.68µ ± 3%    59.91µ ± 1%  -53.80% (p=0.000 n=20)   44.37µ ± 1%  -65.79% (p=0.000 n=20)
Mid             21.15µ ± 0%    19.30µ ± 0%   -8.76% (p=0.000 n=20)   12.30µ ± 0%  -41.85% (p=0.000 n=20)
Long           13.772µ ± 0%   11.028µ ± 0%  -19.92% (p=0.000 n=20)   3.285µ ± 0%  -76.15% (p=0.000 n=20)
geomean         33.55µ         23.36µ       -30.37%                  12.15µ       -63.80%

        │ strchrnul_pre.out │          strchrnul_scalar.out          │         strchrnul_baseline.out         │
        │        B/s        │      B/s       vs base                 │      B/s       vs base                 │
Short          919.3Mi ± 3%   1989.7Mi ± 1%  +116.45% (p=0.000 n=20)   2686.8Mi ± 1%  +192.28% (p=0.000 n=20)
Mid            5.505Gi ± 0%    6.033Gi ± 0%    +9.60% (p=0.000 n=20)    9.466Gi ± 0%   +71.97% (p=0.000 n=20)
Long           8.453Gi ± 0%   10.557Gi ± 0%   +24.88% (p=0.000 n=20)   35.441Gi ± 0%  +319.26% (p=0.000 n=20)
geomean        3.470Gi         4.983Gi        +43.62%                   9.584Gi       +176.22%

For comparison, glibc on the same machine:

        │ strchrnul_glibc.out │
        │       sec/op        │
Short             49.73µ ± 0%
Mid               14.60µ ± 0%
Long              1.237µ ± 0%
geomean           9.646µ

        │ strchrnul_glibc.out │
        │         B/s         │
Short            2.341Gi ± 0%
Mid              7.976Gi ± 0%
Long             94.14Gi ± 0%
geomean          12.07Gi

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
Differential Revision: https://reviews.freebsd.org/D41333
2023-08-06 15:58:27 +02:00
Robert Clausecker
ad2fac552c lib/libc/amd64: add archlevel-based simd dispatch framework
Add a framework for selecting from one of multiple implementations
of a function based on amd64 architecture level (cf. amd64 SysV
ABI supplement).

Sponsored by:	The FreeBSD Foundation
Approved by:	kib
Reviewed by:	jrtc27
Differential Revision:	https://reviews.freebsd.org/D40693
2023-08-04 01:53:43 +03:00
Mateusz Guzik
fbc002cb72 amd64: bring back asm bcmp, shared with memcmp
Turns out clang converts "memcmp(foo, bar, len) == 0" and similar to
bcmp calls.

Reviewed by:	emaste (previous version), jhb (previous version)
Differential Revision:	https://reviews.freebsd.org/D34673
2022-03-26 09:10:03 +00:00
Mateusz Guzik
5fc3cc2713 amd64: make bcmp in libc just call memcmp
Preferably bcmp would just alias memcmp but there is build magic which
makes this problematic.

Reviewed by:	jhb
Differential Revision:		https://reviews.freebsd.org/D28846
2022-03-12 14:59:14 +00:00
Mateusz Guzik
7f06b217c5 amd64: import asm strlen into libc
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28845
2021-02-23 00:09:55 +00:00
Mateusz Guzik
6fff634455 amd64: convert libc bzero to a C func to avoid future bloat
Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17549
2018-11-15 20:20:39 +00:00
Mateusz Guzik
9c7d70ee7d amd64: convert libc bcopy to a C func to avoid future bloat
The function is of limited use and is an almost a direct clone of
memmove/memcpy (with arguments swapped). Introduction of ERMS variants
of string routines would mean avoidable growth of libc.

bcopy will get redefined to a __builtin_memmove later on with this
symbol only left for compatibility.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17539
2018-10-13 21:17:28 +00:00
Brooks Davis
9fe44df287 Correct MDSRCS use in <arch>/string/Makefile.inc.
- Remove .c files which duplicate entries in MISRCS.
- Use the same, less merge conflict prone style in all cases.
- Use MDSRCS for mips (.c and .S files both ended up in SRCS).
- Remove pointless sparc64 Makefile.inc.
- Remove uninformative foreign VCS ID entries.

Reviewed by:	emaste, imp, jhb
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9841
2017-03-02 17:05:52 +00:00
George V. Neville-Neil
c03b5ad6a9 Make both stpcpy and strcpy be assembly language implementations
on amd64.

Submitted by:	Guillaume Morin (guillaume at morinfr.org)
Reviewed by:	kib, jhb
Approved by:	re (bz)
MFC after:	1 month
2011-07-21 16:32:13 +00:00
Alan Cox
7e266fcd1f Add a machine-specific, optimized implementation of strcat.
PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-10 18:58:49 +00:00
Alan Cox
6524eb94a1 Add a machine-specific, optimized implementation of strcpy.
PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-10 05:11:06 +00:00
Alan Cox
e5dd4df84c Add a machine-specific, optimized implementation of strcmp.
PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-09 20:47:08 +00:00
Alan Cox
26f6218be9 Add machine-specific, optimized implementations of bcmp and memcmp.
PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-08 05:15:55 +00:00
Alan Cox
91c09a383a Add machine-specific, optimized implementations of bcopy, bzero, memcpy,
memmove, and memset.

PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-07 03:56:03 +00:00