Officially since C11 (and in reality FreeBSD since 3.0 with commit
1b46cb523d) errno has been defined to be a macro. Rename the symbol
to __libsys_errno and move it to FBSDprivate_1.0 and confine it entierly
to libsys for use by libthr. Add a FBSD_1.0 compat symbol for existing
binaries that were incorrectly linked to the errno symbol during
libc.so.7's lifetime.
This deliberately breaks linking software that directly links to errno.
Such software is broken and will fail in surprising ways if it becomes
threaded (e.g., if it triggers loading of a pam or nss module that
uses threads.)
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D46780
The function is called by rtld with the rtld bind lock write-locked,
when fixing the stack permission during dso load. Not every ARMv7 CPU
supports the div, which causes the recursive entry into rtld to resolve
the __aeabi_uidiv symbol, causing self-lock.
Workaround the problem by using roundup2() instead of open-coding less
efficient formula.
Diagnosed by: mmel
Based on submission by: John F Carr <jfc@mit.edu>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Add the ability to pre-resolve architecture-specific EABI symbols and
use it on arm for selected EABI functions. These functions can be called
with rtld bind lock write-locked, so they should be resolved in forward.
Reported by: Mark Millard <marklmi@yahoo.com>, John F Carr <jfc@mit.edu>
Reviewed by: kib, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46104
Ahead of including inline in __always_inline, move __always_inline to
where inline goes.
Reviewed by: imp, kib, olce
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D45710
The use of SYS_sigreturn is unnecessary here.
If handle_signal is called when a signal is delivered, it can just
return normally back to sigcode which will call sigreturn anyway.
In case handle_signal is called by check_deferred_signal, using
setcontext is better than SYS_sigreturn because that is the correct
system call to pair with the earlier getcontext.
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D44893
Align these signatures with the ones in syscalls.master (and thus
libsys.h). There's no reason to do va_args twice and in some ABIs
(e.g,, CheriABI) you can't access fixed arguments as varargs if you
weren't called with varargs signature.
Reviewed by: imp, kib, jhibbits
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D45126
Use __libc_interposing_slot() in favor of __libsys_interposing_slot() so
that the interposing interface is entierly between libc and libthr with
libsys only involved as an implementation detail.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44880
The (optional) third argument of fcntl is sometimes a pointer so change
the type to intptr_t. Update the libc-internal defintion (actually used
by libthr) to take a fixed intptr_t argument rather than pretending it's
a variadic function. (That worked because all supported architectures
pass variadic arguments as though the function was declared with those
types. In CheriBSD that changes because variadic arguments are passed
via a bounded array.)
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44381
The function was renamed to _thr_cond_timedwait in commit 0ab1bfc7b2
and for some reason did not get the same __weak_reference treatment as
other _pthread_cond symbols.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44244
Declare in sys/umtx.h and implement in libsys. Explicitly link libthr
with libsys.
When building libthr static include _umtx_op_err so we don't break static
linkage with -lpthread.
Reviewed by: kib, emaste, imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/908
System calls or their wrappers are now interposed by
__libsys_interposing with purely libc entries remaining in
__libc_interposing.
Use __libsys_interposing_slot in libthr to update __libsys_interposing,
but also make __libc_interposing_slot fall back to
__libsys_interposing_slot so an out of date libc has a chance of working
during updates.
Reviewed by: kib, emaste, imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/908
Otherwise the lock upgrade performed by rtld's load_filtees() can result
in infinite recursion, wherein:
1. _rtld_bind() acquires the bind read lock,
2. the source DSO's filtees haven't been loaded yet, so the lock upgrade
in load_filtees() cause rtld to jump to _rtld_bind() and release the
bind lock,
3. _thr_rtld_lock_release() calls _thr_ast(), which calls thr_wake(),
which hasn't been resolved yet,
4. _rtld_bind() acquires the bind read lock in order to resolve
thr_wake(),
5. ...
See the linked pull request for an instance of this problem arising with
libsys. That particular instance is also worked around by commit
e7951d0b04.
Reported by: brooks
Reviewed by: kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/908
MFC after: 1 week
Sponsored by: Innovate UK
Similarly as in the previous commit, using calloc() instead of malloc()
is useless here in the regular case since the subsequent call to
cpuset_getaffinify() is going to completely fill the allocated memory.
However, there is an additional complication. This function tries to
allocate memory to hold the cpuset if it previously wasn't, and does so
before the thread lock is acquired, which can fail on a bad thread ID.
In this case, it is necessary to deallocate the memory allocated in this
function so that the attributes object appears unmodified to the caller
when an error is returned. Without this, a subsequent call to
pthread_attr_getaffinity_np() would expose uninitialized memory (not
a security problem per se, since it comes from the same process) instead
of returning a full mask as it would before the failing call to
pthread_attr_get_np(). So the caller would be able to notice a change
in the state of the attributes object even if pthread_attr_get_np()
reported failure, which would be quite surprising. A similar problem
that could occur on failure of cpuset_setaffinity() has been fixed.
Finally, we shall always report memory allocation failure. This already
goes for pthread_attr_init(), so, if for nothing else, just be
consistent.
Reviewed by: emaste, kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43329
Using calloc() instead of malloc() is useless here since the allocated
memory is to be wholly crushed by the memcpy() call that follows.
Suggested by: kib
Reviewed by: emaste, kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D43328
The change of argument for sizeof() (from a type to an object) is to be
consistent with the change done for the malloc() code just above in the
preceding commit touching this file.
Consider bit flags as integers and test whether they are set with an
explicit comparison with 0.
Use an explicit flag value (PTHREAD_SCOPE_SYSTEM) in place of a variable
that has this value at point of substitution.
All other changes are straightforward.
Suggested by: kib
Reviewed by: kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D43327
On first read, POSIX may seem ambiguous about the return code for some
scheduling-related pthread functions on invalid arguments. But a more
thorough reading and a bit of standards archeology strongly suggests
that this case should be handled by EINVAL and that ENOTSUP is reserved
for implementations providing only part of the functionality required by
the POSIX option POSIX_PRIORITY_SCHEDULING (e.g., if an implementation
doesn't support SCHED_FIFO, it should return ENOTSUP on a call to, e.g.,
sched_setscheduler() with 'policy' SCHED_FIFO).
This reading is supported by the second sentence of the very definition
of ENOTSUP, as worded in CAE/XSI Issue 5 and POSIX Issue 6: "The
implementation does not support this feature of the Realtime Feature
Group.", and the fact that an additional ENOTSUP case was added to
pthread_setschedparam() in Issue 6, which introduces SCHED_SPORADIC,
saying that pthread_setschedparam() may return it when attempting to
dynamically switch to SCHED_SPORADIC on systems that doesn't support
that.
glibc, illumos and NetBSD also support that reading by always returning
EINVAL, and OpenBSD as well, since it always returns EINVAL but the
corresponding code has a comment suggesting returning ENOTSUP for
SCHED_FIFO and SCHED_RR, which it effectively doesn't support.
Additionally, always returning EINVAL fixes inconsistencies where EINVAL
would be returned on some out-of-range values and ENOTSUP on others.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43006
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
pthread_getname_np needs to be provided by libc in order to import
jemalloc 5.3.0.
A stub implementation for libc pthread_getname_np() is added for
_pthread_stubs.c, which always reports empty name for the main thread.
Internal _pthread_getname_np() is not exported, but provided for libc
own use.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41461
The acquisition and release of an uncontended default/normal pthread
mutex on FreeBSD is suprisingly slow, e.g., pthread wrlocks and binary
semaphores both exhibit roughly 33% lower latency, while default/normal
mutexes on Linux exhibit roughly 67% lower latency than FreeBSD. This is
likely explained by the fact that AFAICT in the best case to acquire an
uncontended mutex on Linux one need touch only 1 page and read+modify
only 1 cacheline, whereas on FreeBSD we need to touch at least 4 pages,
read 6 cachelines, and modify at least 4 cachelines.
This patch does not address the pthread mutex architecture. Instead,
it improves performance by adding the __always_inline attribute to
mutex_lock_common() and mutex_unlock_common() to encourage constant
folding and propagation, thereby lowering the latency to acquire and
release a mutex due to a shorter code path with fewer compares, jumps,
and mispredicts.
With this patch on a stock build I see a reduction in latency of roughly
7% for default/normal mutexes, and 17% for robust mutexes. When built
without PTHREADS_ASSERTIONS enabled I see a reduction in latency of
roughly 15% and 26%, respectively. Suprisingly, I see similar reductions
in latency for heavily contended mutexes.
By default, this patch increases the size of libthr.so.3 by 2448 bytes,
but when built without PTHREAD_ASSERTIONS enabled it only increases by
448 bytes.
Reviewed by: jhb (previous version), kib
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40912
This patch fixes a bug which prevents building libthr without
_PTHREADS_INVARIANTS defined. The default remains to build libthr
with -D_PTHREADS_INVARIANTS. However, with this patch, if one builds
libthr with WITHOUT_PTHREADS_ASSERTIONS=true then the latency to
acquire+release a default pthread mutex is reduced by roughly 5%, and a
robust mutex by roughly 18% (as measured by a simple synthetic test on a
Xeon E5-2697a based machine).
Reviewed by: jhb, kib, mjg
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40900
Reduces severe performance degradation due to false-sharing. Note that this
does not account for hardware which can perform adjacent cacheline prefetch.
[mjg: massaged the commit message and the patch to use aligned_alloc
instead of malloc]
PR: 272238
MFC after: 1 week
Since there is only the current thread in the child, no pending readers
exist. Clear the bit, since it confuses future attempts to acquire
write ownership of the rtld locks, due to URWLOCK_PREFER_READERS flag.
To be future-proof, clear all state about pending writers and readers.
PR: 271490
Reported and tested by: KJ Tsanaktsidis <kj@kjtsanaktsidis.id.au>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40178
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
When __thr_pshared_offpage() is called for allocation, it must not use
the cached offpage for the key. Instead, the cached offpage must be
unmapped and removed from the cache, if any.
It is legitimate for the user code to unmap the shared lock object without
destroying it, and then mapping something over the freed VA to carry
another shared lock. In this case the cached offpage must be un-cached.
PR: 269277
Reported by: rau8344@gmail.com
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38345
Since f35093f8 semantics of a thread affinity functions is changed to be a
compatible with Linux:
In case of getaffinity(), the minimum cpuset_t size that the kernel permits is
the maximum CPU id, present in the system, / NBBY bytes, the maximum size is not
limited.
In case of setaffinity(), the kernel does not limit the size of the user-provided
cpuset_t, internally using only the meaningful part of the set, where the upper
bound is the maximum CPU id, present in the system, no larger than the size of
the kernel cpuset_t.
To match pthread_attr_[g|s]etaffinity_np checks of the user-provided cpusets to
the kernel behavior export the minimum cpuset_t size allowed by running kernel
via new sysctl kern.sched.cpusetsizemin and use it in checks.
Reviewed by:
Differential Revision: https://reviews.freebsd.org/D38112
MFC after: 1 week
In libthr we use PAGE_SIZE when allocating memory with mmap and to check
various structs will fit into a single page so we can use this allocator
for them.
Ask the kernel for the page size on init for use by the page allcator
and add a new machine dependent macro to hold the smallest page size
the architecture supports to check the structure is small enough.
This allows us to use the same libthr on arm64 with either 4k or 16k
pages.
Reviewed by: kib, markj, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34984
Rather than calling getpagesize() twice use the value saved after the
first call to size a mmap allocation.
Reviewed by: kib, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34983
This matches the type in other unwind headers (LLVM libunwind,
libcxxrt, glibc).
NB: include/unwind.h is not installed but is only used by libthr
Reviewed by: imp, dim, emaste
Differential Revision: https://reviews.freebsd.org/D34049