Commit graph

21 commits

Author SHA1 Message Date
Conrad Meyer
ca0ec73c11 Expand generic subword atomic primitives
The goal of this change is to make the atomic_load_acq_{8,16},
atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives
available in MI-namespace.

The second goal is to get this draft out of my local tree, as anything that
requires a full tinderbox is a big burden out of tree.  MD specifics can be
refined individually afterwards.

The generic implementations may not be ideal for your architecture; feel
free to implement better versions.  If no subword_atomic definitions are
needed, the include can be removed from your arch's machine/atomic.h.
Generic definitions are guarded by defined macros of the same name.  To
avoid picking up conflicting generic definitions, some macro defines are
added to various MD machine/atomic.h to register an existing implementation.

Include _atomic_subword.h in arm and arm64 machine/atomic.h.

For some odd reason, KCSAN only generates some versions of primitives.
Generate the _acq variants of atomic_load.*_8, atomic_load.*_16, and
atomic_testandset.*_long.  There are other questionably disabled primitives,
but I didn't run into them, so I left them alone.  KCSAN is only built for
amd64 in tinderbox for now.

Add atomic_subword implementations of atomic_load_acq_{8,16} implemented
using masking and atomic_load_acq_32.

Add generic atomic_subword implementations of atomic_testandset_long(),
atomic_testandclear_long(), and atomic_testandset_acq_long(), using
atomic_fcmpset_long() and atomic_fcmpset_acq_long().

On x86, add atomic_testandset_acq_long as an alias for
atomic_testandset_long.

Reviewed by:	kevans, rlibby (previous versions both)
Differential Revision:	https://reviews.freebsd.org/D22963
2020-03-25 23:12:43 +00:00
Mark Johnston
a83c682b36 Dynamically select LSE-based atomic(9)s on arm64.
Once all CPUs are online, determine if they all support LSE atomics and
set lse_supported to indicate this.  For now the atomic(9)
implementations are still always inlined, though it would be preferable
to create out-of-line functions to avoid text bloat.  This was not done
here since big.little systems exist in which some CPUs implement LSE
while others do not, and ifunc resolution must occur well before this
scenario can be detected.  It does seem unlikely that FreeBSD will
ever run on such platforms, however, so converting atomic(9) to use
ifuncs is probably a good next step.

Add a LSE_ATOMICS arm64 kernel configuration option to unconditionally
select LSE-based atomic(9) implementations when the target system is
known.

Reviewed by:	andrew, kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation, Amazon (hardware)
Differential Revision:	https://reviews.freebsd.org/D23325
2020-02-03 18:23:50 +00:00
Mark Johnston
920de6a15f Add LSE-based atomic(9) implementations.
These make use of the cas*, ld* and swp instructions added in ARMv8.1.
Testing shows them to be significantly more performant than LL/SC-based
implementations.

No functional change here since the wrappers still unconditionally
select the _llsc variants.

Reviewed by:	andrew, kib
MFC after:	1 month
Submitted by:	Ali Saidi <alisaidi@amazon.com> (original version)
Differential Revision:	https://reviews.freebsd.org/D23324
2020-02-03 18:23:35 +00:00
Mark Johnston
c1fced6800 Add wrappers for arm64 atomics.
Add a _llsc suffix for the existing LL/SC-based implementations and add
trivial wrappers.  This is in preparation for supporting LSE-based
atomic(9) implementations.

No functional change intended.

Reviewed by:	andrew, kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation, Amazon (hardware)
Differential Revision:	https://reviews.freebsd.org/D23323
2020-02-03 18:23:14 +00:00
Mark Johnston
3ad6c736cf Provide a single implementation for each of the arm64 atomic(9) ops.
Parameterize the macros by type width as well as acq/rel semantics.
This makes modifying the implementations much less tedious and
error-prone and makes it easier to support alternate LSE-based
implementations.  No functional change intended.

Reviewed by:	andrew, kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation, Amazon (hardware)
Differential Revision:	https://reviews.freebsd.org/D23322
2020-02-03 18:22:59 +00:00
Andrew Turner
f2792e5e55 Add atomic_testandset/clear on arm64.
These will reportedly be used in future uma changes.

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D23019
2020-01-09 10:26:36 +00:00
Andrew Turner
6d26116baa Add the 8 and 16 bit atomic load/store functions with a barrier on arm64.
Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22966
2020-01-03 10:03:36 +00:00
Andrew Turner
849aef496d Port the NetBSD KCSAN runtime to FreeBSD.
Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in
the FreeBSD kernel. It is a useful tool for finding data races between
threads executing on different CPUs.

This can be enabled by enabling KCSAN in the kernel config, or by using the
GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later
needs a compiler change to allow -fsanitize=thread that KCSAN uses.

Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22315
2019-11-21 11:22:08 +00:00
Andrew Turner
4ffa494e4f Add more 8 and 16 bit variants of the the atomic(9) functions on arm64.
These are direct copies of the 32 bit functions, adjusted ad needed.
While here fix atomic_fcmpset_16 to use the valid load and store exclusive
instructions.

Sponsored by:	DARPA, AFRL
2019-11-07 17:34:44 +00:00
Emmanuel Vadot
44654b755d arm64: fix atomic_fcmpset_16
newval needs to be uint16_t

Reported by:	andrew
2018-05-28 21:05:00 +00:00
Emmanuel Vadot
e39ce4cafb arm64: Add atomic_fcmpset_8 and atomic_fcmpset_16
Reviewed by:	cognet
2018-05-28 20:29:03 +00:00
Konstantin Belousov
30d4f9e888 Add atomic_load(9) and atomic_store(9) operations.
They provide relaxed-ordered atomic access semantic.  Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses.  The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.

The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations.  It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.

Suggested by:	jhb
Reviewed by:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13534
2017-12-19 09:59:20 +00:00
Andrew Turner
2da3e34e0a Remove a blank line accidentally added in r320403. 2017-06-30 14:45:43 +00:00
Andrew Turner
c90baf6817 Some of the atomic_clear_* functions were incorrectly defined to be an
atomic add. Correct these, fixing a NULL-pointer dereference in netgraph.

PR:		220273
MFC after:	3 days
Sponsored by:	DARPA, AFRL
2017-06-27 10:45:13 +00:00
Olivier Houchard
dc5f9fcdae Implement atomic_fcmpset_* for arm and arm64. 2017-01-28 16:24:06 +00:00
Andrew Turner
119a353e3d Rework the atomic code to reduce the repetition. This merges some of the
atomic functions where they are almost identical, or have acquire/release
semantics.

While here clean these function up. The cbnz instruction doesn't change
the condition flags so drop cc, however they should have memory added to the
clobber list.

Reviewed by:	kib
Sponsored by:	ABT Systems Ltd
Differential Revision:	https://reviews.freebsd.org/D4318
2015-12-01 12:27:36 +00:00
Andrew Turner
9b8c3c4f0b Add more atomic_swap_* functions.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 13:34:43 +00:00
Konstantin Belousov
8954a9a4e6 Add the atomic_thread_fence() family of functions with intent to
provide a semantic defined by the C11 fences with corresponding
memory_order.

atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.

atomic_thread_fence_rel() is r, w | w.

atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation.  Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.

atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.

Reviewed by:	alc
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-07-08 18:12:24 +00:00
Zbigniew Bodek
458f2175ca Add options to dmb() and dsb() macros on ARM64
Using plain dsb()/dmb() as full system barriers is usually to much.
Adding proper options to those barriers (instead of full system - sy)
will most likely reduce the cost of the instructions and will benefit
in performance improvement.
This commit adds options to barrier macro definitions.

Obtained from: Semihalf
Reviewed by:   andrew, ian
Sponsored by:  The FreeBSD Foundation
2015-06-09 23:54:20 +00:00
Andrew Turner
46f52b028f Split out the _acq and _rel functions. These were the same, but there is
no need for them to be this strong, we only need to provide one or the
other.

While here replace atomic_load_acq_* and atomic_store_rel_* with a single
instruction version, and fix the definition of atomic_clear_* to point to
the correct functions.

Sponsored by:	The FreeBSD Foundation
2015-04-06 16:27:22 +00:00
Andrew Turner
412042e2ae Add the start of the arm64 machine headers. This is the subset needed to
start getting userland libraries building.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
2015-03-23 11:54:56 +00:00