Commit graph

53 commits

Author SHA1 Message Date
Brandon Bergren
9941cb0657 [PowerPC] Fix atomic_cmpset_masked().
A recent kernel change caused the previously unused atomic_cmpset_masked() to
be used.

It had a typo in it.

Instead of reading the old value from an uninitialized variable, read it
from the passed-in pointer as intended.

This fixes crashes on 64 bit Book-E.

Obtained from:	jhibbits
2020-05-26 19:03:45 +00:00
Conrad Meyer
ca0ec73c11 Expand generic subword atomic primitives
The goal of this change is to make the atomic_load_acq_{8,16},
atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives
available in MI-namespace.

The second goal is to get this draft out of my local tree, as anything that
requires a full tinderbox is a big burden out of tree.  MD specifics can be
refined individually afterwards.

The generic implementations may not be ideal for your architecture; feel
free to implement better versions.  If no subword_atomic definitions are
needed, the include can be removed from your arch's machine/atomic.h.
Generic definitions are guarded by defined macros of the same name.  To
avoid picking up conflicting generic definitions, some macro defines are
added to various MD machine/atomic.h to register an existing implementation.

Include _atomic_subword.h in arm and arm64 machine/atomic.h.

For some odd reason, KCSAN only generates some versions of primitives.
Generate the _acq variants of atomic_load.*_8, atomic_load.*_16, and
atomic_testandset.*_long.  There are other questionably disabled primitives,
but I didn't run into them, so I left them alone.  KCSAN is only built for
amd64 in tinderbox for now.

Add atomic_subword implementations of atomic_load_acq_{8,16} implemented
using masking and atomic_load_acq_32.

Add generic atomic_subword implementations of atomic_testandset_long(),
atomic_testandclear_long(), and atomic_testandset_acq_long(), using
atomic_fcmpset_long() and atomic_fcmpset_acq_long().

On x86, add atomic_testandset_acq_long as an alias for
atomic_testandset_long.

Reviewed by:	kevans, rlibby (previous versions both)
Differential Revision:	https://reviews.freebsd.org/D22963
2020-03-25 23:12:43 +00:00
Brandon Bergren
9aafc7c052 [PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations
This is a lock-based emulation of 64-bit atomics for kernel use, split off
from an earlier patch by jhibbits.

This is needed to unblock future improvements that reduce the need for
locking on 64-bit platforms by using atomic updates.

The implementation allows for future integration with userland atomic64,
but as that implies going through sysarch for every use, the current
status quo of userland doing its own locking may be for the best.

Submitted by:	jhibbits (original patch), kevans (mips bits)
Reviewed by:	jhibbits, jeff, kevans
Differential Revision:	https://reviews.freebsd.org/D22976
2020-01-02 23:20:37 +00:00
Justin Hibbits
d0bdb11139 atomic: Add atomic_cmpset_masked to powerpc and use it
Summary:
This is a more optimal way of doing atomic_compset_masked() than the
fallback in sys/_atomic_subword.h.  There's also an override for
_atomic_fcmpset_masked_word(), which may or may not be necessary, and is
unused for powerpc.

Reviewed by:	kevans, kib
Differential Revision:	https://reviews.freebsd.org/D22359
2019-11-15 04:33:07 +00:00
Justin Hibbits
9551397f51 powerpc/atomic: Fix atomic_cmpset_rel()
Need a release barrier, not an acquire barrier, else bad things happen.
2019-10-15 03:37:21 +00:00
Justin Hibbits
84046d16eb powerpc: Implement atomic_(f)cmpset_ for short and char
|
This adds two implementations for each atomic_fcmpset_ and atomic_cmpset_
short and char functions, selectable at compile time for the target
architecture.  By default, it uses a generic shift-and-mask to perform atomic
updates to sub-components of 32-bit words from <sys/_atomic_subword.h>.
However, if ISA_206_ATOMICS is defined it uses the ll/sc instructions for
halfword and bytes, introduced in PowerISA 2.06.  These instructions are
supported by all IBM processors from POWER7 on, as well as the Freescale/NXP
e6500 core.  Although the e5500 and e500mc both implement PowerISA 2.06 they
do not implement these instructions.

As part of this, clean up the atomic_(f)cmpset_acq and _rel wrappers, by
using macros to reduce code duplication.

ISA_206_ATOMICS requires clang or newer binutils (2.20 or later).

Differential Revision:	https://reviews.freebsd.org/D21682
2019-10-08 01:36:34 +00:00
Justin Hibbits
e44ed9d3d4 powerpc/atomic: Follow recommendations on atomic primitive comparisons
Both IBM and Freescale programming examples presume the cmpset operands will
favor equal, and pessimize the non-equal case instead.  Do the same for
atomic_cmpset_* and atomic_fcmpset_*.  This slightly pessimizes the failure
case, in favor of the success case.

MFC after:	3 weeks
2019-09-25 01:39:58 +00:00
Hans Petter Selasky
d7a9bfee8f Implement atomic_swap_xxx() for all platforms.
Differential Revision:	https://reviews.freebsd.org/D18450
Reviewed by:		kib@
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-12-10 13:38:13 +00:00
Justin Hibbits
6a0fd1a51b powerpc/atomic: Loosen the memory barrier on atomic_load_acq_*()
'sync' is pretty heavy-handed, and is unnecessary for this use case.  It's a
full barrier, which is applicable for all storage types.  However,
atomic_load_acq_*() is only expected to operate on physical memory, not
device memory, so lwsync is sufficient (lwsync provides access ordering on
memory that is marked as Coherency Required and is not Write Through nor
Cache Inhibited).  On 32-bit systems, this is a nop, since powerpc_lwsync()
is defined to use sync, as a workaround for a silicon bug in the Freescale
e500 core.
2018-11-07 01:42:00 +00:00
Konstantin Belousov
30d4f9e888 Add atomic_load(9) and atomic_store(9) operations.
They provide relaxed-ordered atomic access semantic.  Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses.  The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.

The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations.  It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.

Suggested by:	jhb
Reviewed by:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13534
2017-12-19 09:59:20 +00:00
Pedro F. Giffuni
71e3c3083b sys/powerpc: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:09:59 +00:00
Justin Hibbits
d3a8234cef Don't retry a lost reservation in atomic_fcmpset()
The desired behavior of atomic_fcmpset_() is to always exit on error.  Instead
of retrying on lost reservation, leave the retry to the caller, and return
error.

Reported by:	kib
2017-01-31 03:40:13 +00:00
Justin Hibbits
37af2ad077 Drop the __GNUCLIKE_ASM guards around most atomic inlines.
There are no alternatives defined, so there's no point in keeping them.  Also,
they weren't around every inline asm block anyway.  Without __GNUCLIKE_ASM
defined, the guarded functions return garbage.

Reported by:	Andrew Thompson
2017-01-30 02:52:15 +00:00
Justin Hibbits
02f151d412 Add atomic_fcmpset_*() inlines for powerpc
Summary:
atomic_fcmpset_*() is analogous to atomic_cmpset(), but saves off the read value
from the target memory location into the 'old' pointer in the case of failure.

Requested by:	 mjg
Differential Revision: https://reviews.freebsd.org/D9325
2017-01-30 02:15:54 +00:00
Konstantin Belousov
0b39ffb35f On PowerPC 64bit, the linux-compat mb() definition is implemented with
lwsync instruction, which does not provide Store/Load barrier.  Fix
this by using "full" sync barrier for mb().

atomic_store_rel() does not need full barrier, change mb() call there
to the lwsync instruction if not hitting the known CPU erratas
(i.e. on 32bit).  Provide powerpc_lwsync() helper to isolate the
lwsync/sync compile time selection, and use it in atomic_store_rel()
and several other places which duplicate the code.

Noted by:	alc
Reviewed and tested by:	nwhitehorn
Sponsored by:	The FreeBSD Foundation
2015-11-24 09:13:21 +00:00
Konstantin Belousov
8954a9a4e6 Add the atomic_thread_fence() family of functions with intent to
provide a semantic defined by the C11 fences with corresponding
memory_order.

atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.

atomic_thread_fence_rel() is r, w | w.

atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation.  Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.

atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.

Reviewed by:	alc
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-07-08 18:12:24 +00:00
Justin Hibbits
181ca73b1a Small performance optimization. Clobber only cr0, rather than the entire CR.
Discussed with:	rdivacky,nwhitehorn
MFC after:	3 weeks
2014-04-11 06:17:44 +00:00
Andreas Tobler
feb86bbe4f Described in the man page but not implemented. Here it comes,
atomic_swap_32/64. The latter only for powerpc64.

MFC after:	1 month
2014-01-13 22:21:29 +00:00
Bjoern A. Zeeb
08c5f3303d Add a missing " to get closer to compiling. 2012-05-24 23:46:17 +00:00
Nathan Whitehorn
270dc329b7 Atomic operation acquire barriers also need to be isync on 64-bit systems. 2012-05-24 22:14:39 +00:00
Marcel Moolenaar
7097794901 Revert isync for ILP32 to sync as per my original change that I discussed
with Nathan. Leave __ATOMIC_ACQ as an isync as per Nathan.
2012-05-24 22:06:00 +00:00
Marcel Moolenaar
df0bef25eb Fix the memory barriers for CPUs that do not like lwsync and wedge or cause
exceptions early enough during boot that the kernel will do ithe same.
Use lwsync only when compiling for LP64 and revert to the more proven isync
when compiling for ILP32. Note that in the end (i.e. between revision 222198
and this change) ILP32 changed from using sync to using isync. As per Nathan
the isync is needed to make sure I/O accesses are properly serialized with
locks and isync tends to be more effecient than sync.

While here, undefine __ATOMIC_ACQ and __ATOMIC_REL at the end of the file
so as not to leak their definitions.

Discussed with: nwhitehorn
2012-05-24 20:45:44 +00:00
Nathan Whitehorn
bc96dccc69 Fix final bugs in memory barriers on PowerPC:
- Use isync/lwsync unconditionally for acquire/release. Use of isync
  guarantees a complete memory barrier, which is important for serialization
  of bus space accesses with mutexes on multi-processor systems.
- Go back to using sync as the I/O memory barrier, which solves the same
  problem as above with respect to mutex release using lwsync, while not
  penalizing non-I/O operations like a return to sync on the atomic release
  operations would.
- Place an acquisition barrier around thread lock acquisition in
  cpu_switchin().
2012-05-04 16:00:22 +00:00
Nathan Whitehorn
a4cbf436e7 Provide a clearer split between read/write and acquire/release barriers.
This should really, actually be correct now.
2012-04-22 22:27:35 +00:00
Nathan Whitehorn
a6349a998d Clarify what we are doing in r234583 a little better: eieio and isync do
not provide general barriers, but only barriers in the context of the
atomic sequences here. As such, make them private and keep the global
*mb() routines using a variant of sync.
2012-04-22 21:11:01 +00:00
Nathan Whitehorn
83ae3d5531 On non-64-bit systems (which generally don't have lwsync), use eieio and
isync to implement read and write barriers, following Appendix B.2 of
Book II of the architecture manual. This provides a 25% speed increase
to fork() on the PowerPC G4.
2012-04-22 20:23:34 +00:00
Nathan Whitehorn
6f26a88999 Use lwsync to provide memory barriers on systems that support it instead
of sync (lwsync is an alternate encoding of sync on systems that do not
support it, providing graceful fallback). This provides more than an order
of magnitude reduction in the time required to acquire or release a mutex.

MFC after:	2 months
2012-04-22 19:00:51 +00:00
Attilio Rao
dc6dc1f573 Merge r221614,221696,221737,221840 from largeSMP project branch:
Rewrite atomic operations for powerpc in order to achieve the following:
- Produce a type-clean implementation (in terms of functions arguments
  and returned values) for the primitives.
- Fix errors with _long() atomics where they ended up with the wrong
  arguments to be accepted.
- Follow the sys/type.h specifics that define the numbered types starting
  from standard C types.
- Let _ptr() version to not auto-magically cast arguments, but leave
  the burden on callers, as _ptr() atomic is intended to be used
  relatively rarely.

Fix cfi in order to support the latest point.

In collabouration with:	bde
Tested by:		andreast, nwhitehorn, jceel
MFC after:		2 weeks
2011-05-22 20:55:54 +00:00
Nathan Whitehorn
c3e289e1ce MFppc64:
Kernel sources for 64-bit PowerPC, along with build-system changes to keep
32-bit kernels compiling (build system changes for 64-bit kernels are
coming later). Existing 32-bit PowerPC kernel configurations must be
updated after this change to specify their architecture.
2010-07-13 05:32:19 +00:00
Marcel Moolenaar
7b30cb9c7c Unbreak previous commit. 2008-11-22 22:15:34 +00:00
Kip Macy
db7f0b974f - bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
  allow drivers to efficiently manage multiple hardware queues
  (i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.
2008-11-22 05:55:56 +00:00
Marcel Moolenaar
bf8ad5a884 Fix copy-n-paste typos in free text. 2008-04-10 02:37:26 +00:00
Marcel Moolenaar
c563d53362 Reimplement atomic_add, atomic_clear, atomic_set and atomic_subtract
so that all implemented variants have proper prototypes. The 8-bit,
16-bit and 64-bit variants are not implemented.

This really fixes the current build breakages caused by type casting
and struct aliasing rules.
2008-04-09 01:00:35 +00:00
Marcel Moolenaar
ca6f63a1ed Quick fix for the kernel build breakage in netgraph and the
aliasing warning in libthr. A more elaborate fix is in the
works that makes sure that all variants have proper inline
functions with proper types.
2008-04-08 16:34:50 +00:00
Pawel Jakub Dawidek
6eb4157ffc Implement atomic_fetchadd_long() for all architectures and document it.
Reviewed by:	attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)
2008-03-16 21:20:50 +00:00
Jason Evans
8af8e94855 Define atomic_readandclear_ptr. 2007-11-27 06:34:15 +00:00
John Birrell
ba90c265b0 Implement the _long functions using u_long rather than trying to
cast as uint32_t which is defined as unsigned int. gcc doesn't want to
consider that there might not be much difference between an int and
a long on a 32 bit architecture.
2007-11-26 05:52:45 +00:00
John Birrell
912097517a Define atomic_cmpset_acq_long and atomic_cmpset_rel_long so that
they use casts rather than just assuming that the compiler will DTRT
without complaining.
2007-11-19 03:16:16 +00:00
Marcel Moolenaar
c108b80c8c Cast the arguments to atomic_*_ptr() when mapping it to atomic_*_32()
This is a minimal fix.

Approved by: re (kensmith)
2007-07-10 04:40:00 +00:00
John Baldwin
3c2bc2bf26 Add a new atomic_fetchadd() primitive that atomically adds a value to a
variable and returns the previous value of the variable.

Tested on:	i386, alpha, sparc64, arm (cognet)
Reviewed by:	arch@
Submitted by:	cognet (arm)
MFC after:	1 week
2005-09-27 17:39:11 +00:00
John Baldwin
80d52f16da Stop using the '+' constraint modifier with inline assembly. The '+'
constraint is actually only allowed for register operands.  Instead, use
separate input and output memory constraints.

Education from:	alc
Reviewed by:	alc
Tested on:	i386, alpha
MFC after:	1 week
2005-09-15 19:31:22 +00:00
John Baldwin
122eceef61 Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables.  This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after:	3 days
Tested on:	i386, alpha, sparc64
Compiled on:	ia64, powerpc, amd64
Kernel toolchain busted on:	arm
2005-07-15 18:17:59 +00:00
Joerg Wunsch
a5f50ef9e4 netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild.  Extension to other compilers is supposed
to be possible, of course.

Submitted by:	netchild
Reviewed by:	various developers on arch@, some time ago
2005-03-02 21:33:29 +00:00
Peter Grehan
70134d768c - change all u_int_XX to uint_XX
- cast param for atomic_subtract_long, since Netgraph uses it.
2005-02-01 11:17:24 +00:00
Peter Grehan
8e9238c604 Fix bugs with operand ordering and unnecessary sync/eieio ops. Mostly
obtained from Alpha atomic.h

Approved by:  Benno
2003-01-18 11:28:36 +00:00
Peter Grehan
6e1073f023 Fixed branch labels
Approved by: benno
2002-09-19 04:39:59 +00:00
Benno Rice
dfc02c301d Make atomic_cmpset_32 correctly return 0 on failure. 2002-02-24 23:31:49 +00:00
Benno Rice
abc5579e8c Fix the atomic_*_32 operations. These were written before I had the ability
to test them properly and before I had a working knowledge of GCC asm
constraints.
2001-06-27 12:17:23 +00:00
Benno Rice
7a0e745f1a Don't initialise ret in atomic_cmpset_32.
Add more synchronisation.
2001-06-26 13:54:17 +00:00
Benno Rice
e2d53d7c4a Fix asm constraints for atomic_cmpset_32. This fix may also be needed
elsewhere.
2001-06-24 06:36:28 +00:00