The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
This reverts commit 710e45c4b8.
It breaks for some corner cases on big endian ppc64.
Given the stage of the release process it is best to revert for now.
Reported by: jhibbits
The previous code neglected to use primitives which can find the end
of the string without having to branch on every character.
While here augment the somewhat misleading commentary -- strlen as
implemented here leaves performance on the table, especially so for
userspace. Every arch should get a dedicated variant instead.
In the meantime this commit lessens the problem.
Tested with glibc test suite.
Naive test just calling strlen in a loop on Haswell (ops/s):
$(perl -e "print 'A' x 3"):
before: 211198039
after: 338626619
$(perl -e "print 'A' x 100"):
before: 83151997
after: 98285919
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
blog posting [1].
- Use word-sized test for unaligned pointer before working
the hard way.
Memory page boundary is always integral multiple of a word
alignment boundary. Therefore, if we can access memory
referenced by pointer p, then (p & ~word mask) must be also
accessible.
- Better utilization of multi-issue processor's ability of
concurrency.
The previous implementation utilized a formular that must be
executed sequentially. However, the ~, & and - operations can
actually be caculated at the same time when the operand were
different and unrelated.
The original Hacker's Delight formular also offered consistent
performance regardless whether the input would contain
characters with their highest-bit set, as it catches real
nul characters only.
These two optimizations has shown further improvements over the
previous implementation on microbenchmarks on i386 and amd64 CPU
including Pentium 4, Core Duo 2 and i7.
[1] http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2010/03/08#strlen_1
MFC after: 1 month
reducing branches and doing word-sized operation.
The idea is taken from J.T. Conklin's x86_64 optimized version of strlen(3)
for NetBSD, and reimplemented in C by me.
Discussed on: -arch@