The goal of this change is to make the atomic_load_acq_{8,16},
atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives
available in MI-namespace.
The second goal is to get this draft out of my local tree, as anything that
requires a full tinderbox is a big burden out of tree. MD specifics can be
refined individually afterwards.
The generic implementations may not be ideal for your architecture; feel
free to implement better versions. If no subword_atomic definitions are
needed, the include can be removed from your arch's machine/atomic.h.
Generic definitions are guarded by defined macros of the same name. To
avoid picking up conflicting generic definitions, some macro defines are
added to various MD machine/atomic.h to register an existing implementation.
Include _atomic_subword.h in arm and arm64 machine/atomic.h.
For some odd reason, KCSAN only generates some versions of primitives.
Generate the _acq variants of atomic_load.*_8, atomic_load.*_16, and
atomic_testandset.*_long. There are other questionably disabled primitives,
but I didn't run into them, so I left them alone. KCSAN is only built for
amd64 in tinderbox for now.
Add atomic_subword implementations of atomic_load_acq_{8,16} implemented
using masking and atomic_load_acq_32.
Add generic atomic_subword implementations of atomic_testandset_long(),
atomic_testandclear_long(), and atomic_testandset_acq_long(), using
atomic_fcmpset_long() and atomic_fcmpset_acq_long().
On x86, add atomic_testandset_acq_long as an alias for
atomic_testandset_long.
Reviewed by: kevans, rlibby (previous versions both)
Differential Revision: https://reviews.freebsd.org/D22963
This adds 8 and 16 bit versions of the cmpset and fcmpset functions. Macros
are used to generate all the flavors from the same set of instructions; the
macro expansion handles the couple minor differences between each size
variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc).
In addition to handling new sizes, the instruction sequences used for cmpset
and fcmpset are rewritten to be a bit shorter/faster, and the new sequence
will not return false when *dst==*old but the store-exclusive fails because
of concurrent writers. Instead, it just loops like ldrex/strex sequences
normally do until it gets a non-conflicted store. The manpage allows LL/SC
architectures to bogusly return false, but there's no reason to actually do
so, at least on arm.
Reviewed by: cognet
- Add an implementation of atomic_fcmpset_32() using RAS for armv4/v5.
This fixes recent world breakage due to use of atomic_fcmpset() in
userland.
- While here, be more careful to not expose wrapper macros for 64-bit
atomic_*cmpset to userland for armv4/v5 as only 32-bit cmpset is
implemented.
This has been reviewed, but not runtime-tested, but should fix the arm.arm
and arm.armeb worlds that have been broken for a while.
Reviewed by: imp
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D15147
The return value of atomic_cmpset() and atomic_fcmpset() is an int (which
is really a bool) that has the values 0 or 1. Some of the inlines were
using the type being operated on (e.g. uint32_t) as either the return type
of the function, or the type of a local 'ret' variable used to hold the
return value. Fix all of these to just use plain 'int'. Due to C promotion
rules and the fact that the value can only be 0 or 1, these should all be
harmless.
Reviewed by: imp (only the v4 ones)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D15147
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.
ANSIfy related prototypes while here.
Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193
self-consistent, there is no need in anything but compiler barrier in
the implementation of atomic_thread_fence_*() on ARMv5. Split
implementation of fences for ARMv4/5 and ARMv6; the former use
compiler barriers, the later also perform hardware barriers.
An issue which is fixed by the change is the faults from the CP15
coprocessor accesses in the user mode. This was uncovered by the
pthread_once() changes in r287556.
Reported by: Mattia Rossi <mattia.rossi.mailinglists@gmail.com>
Discussed with: alc, cognet, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week