The goal of this change is to make the atomic_load_acq_{8,16},
atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives
available in MI-namespace.
The second goal is to get this draft out of my local tree, as anything that
requires a full tinderbox is a big burden out of tree. MD specifics can be
refined individually afterwards.
The generic implementations may not be ideal for your architecture; feel
free to implement better versions. If no subword_atomic definitions are
needed, the include can be removed from your arch's machine/atomic.h.
Generic definitions are guarded by defined macros of the same name. To
avoid picking up conflicting generic definitions, some macro defines are
added to various MD machine/atomic.h to register an existing implementation.
Include _atomic_subword.h in arm and arm64 machine/atomic.h.
For some odd reason, KCSAN only generates some versions of primitives.
Generate the _acq variants of atomic_load.*_8, atomic_load.*_16, and
atomic_testandset.*_long. There are other questionably disabled primitives,
but I didn't run into them, so I left them alone. KCSAN is only built for
amd64 in tinderbox for now.
Add atomic_subword implementations of atomic_load_acq_{8,16} implemented
using masking and atomic_load_acq_32.
Add generic atomic_subword implementations of atomic_testandset_long(),
atomic_testandclear_long(), and atomic_testandset_acq_long(), using
atomic_fcmpset_long() and atomic_fcmpset_acq_long().
On x86, add atomic_testandset_acq_long as an alias for
atomic_testandset_long.
Reviewed by: kevans, rlibby (previous versions both)
Differential Revision: https://reviews.freebsd.org/D22963
Once all CPUs are online, determine if they all support LSE atomics and
set lse_supported to indicate this. For now the atomic(9)
implementations are still always inlined, though it would be preferable
to create out-of-line functions to avoid text bloat. This was not done
here since big.little systems exist in which some CPUs implement LSE
while others do not, and ifunc resolution must occur well before this
scenario can be detected. It does seem unlikely that FreeBSD will
ever run on such platforms, however, so converting atomic(9) to use
ifuncs is probably a good next step.
Add a LSE_ATOMICS arm64 kernel configuration option to unconditionally
select LSE-based atomic(9) implementations when the target system is
known.
Reviewed by: andrew, kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation, Amazon (hardware)
Differential Revision: https://reviews.freebsd.org/D23325
These make use of the cas*, ld* and swp instructions added in ARMv8.1.
Testing shows them to be significantly more performant than LL/SC-based
implementations.
No functional change here since the wrappers still unconditionally
select the _llsc variants.
Reviewed by: andrew, kib
MFC after: 1 month
Submitted by: Ali Saidi <alisaidi@amazon.com> (original version)
Differential Revision: https://reviews.freebsd.org/D23324
Add a _llsc suffix for the existing LL/SC-based implementations and add
trivial wrappers. This is in preparation for supporting LSE-based
atomic(9) implementations.
No functional change intended.
Reviewed by: andrew, kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation, Amazon (hardware)
Differential Revision: https://reviews.freebsd.org/D23323
Parameterize the macros by type width as well as acq/rel semantics.
This makes modifying the implementations much less tedious and
error-prone and makes it easier to support alternate LSE-based
implementations. No functional change intended.
Reviewed by: andrew, kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation, Amazon (hardware)
Differential Revision: https://reviews.freebsd.org/D23322
These will reportedly be used in future uma changes.
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D23019
Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in
the FreeBSD kernel. It is a useful tool for finding data races between
threads executing on different CPUs.
This can be enabled by enabling KCSAN in the kernel config, or by using the
GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later
needs a compiler change to allow -fsanitize=thread that KCSAN uses.
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22315
These are direct copies of the 32 bit functions, adjusted ad needed.
While here fix atomic_fcmpset_16 to use the valid load and store exclusive
instructions.
Sponsored by: DARPA, AFRL
They provide relaxed-ordered atomic access semantic. Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses. The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.
The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations. It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.
Suggested by: jhb
Reviewed by: alc, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13534
atomic functions where they are almost identical, or have acquire/release
semantics.
While here clean these function up. The cbnz instruction doesn't change
the condition flags so drop cc, however they should have memory added to the
clobber list.
Reviewed by: kib
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D4318
provide a semantic defined by the C11 fences with corresponding
memory_order.
atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.
atomic_thread_fence_rel() is r, w | w.
atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.
atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.
Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Using plain dsb()/dmb() as full system barriers is usually to much.
Adding proper options to those barriers (instead of full system - sy)
will most likely reduce the cost of the instructions and will benefit
in performance improvement.
This commit adds options to barrier macro definitions.
Obtained from: Semihalf
Reviewed by: andrew, ian
Sponsored by: The FreeBSD Foundation
no need for them to be this strong, we only need to provide one or the
other.
While here replace atomic_load_acq_* and atomic_store_rel_* with a single
instruction version, and fix the definition of atomic_clear_* to point to
the correct functions.
Sponsored by: The FreeBSD Foundation