Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
(cherry picked from commit b792434150)
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
(cherry picked from commit 254e4e5b77)
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
For stable/13 only, sys/arm/include/tls.h includes support for
ARM_TP_ADDRESS which is not present in main.
(cherry picked from commit 1a62e9bc00)
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 9feff969a0)
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.
Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.
PR: 259157
Reviewed by: mav, kib, markj (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32580
(cherry picked from commit 0d2224733e)
Allow inclusion of sys/reg.h w/o pre-requisites by making arm's machine/reg.h
self-contained.
Sponsored by: Netflix
(cherry picked from commit b57e0aa4ef)
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
(cherry picked from commit 1adebe3cd6)
in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions. Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.
Deduplicate the implementations for the rest of the platforms. This
will make it easier to implement in_cksum() for unmapped mbufs. On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.
No functional change intended.
Reviewed by: kp, glebius
Sponsored by: The FreeBSD Foundation
(cherry picked from commit ecbbe83144)
Remote detritis copied, apparently, from sparc. utrap has never been
used on arm, so it's safe to just remove it.
Sponsored by: Netflix
(cherry picked from commit c47a4a2375)
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
(cherry picked from commit ab4ed843a3)
This reverts commit b684d812fc.
It causes an issue on a pfsense routing workload where memory
fragmentation prevents the necessary consecutive pages from being
readily available.
Reported by: pfsense (mjg, scottl)
Approved by: ian
MFC after: 1 day
Differential Revision: https://reviews.freebsd.org/D31244
(cherry picked from commit 5647f85ade)
See 3f6867ef63 for additional context.
It is also needed for OpenZFS performance and stability.
Reviewed by: ian (arm), imp
Differential Revision: https://reviews.freebsd.org/D31244
(cherry picked from commit b684d812fc)
This change serves two purposes.
First, we take advantage of the compiler provided endian definitions to
eliminate some long-standing duplication between the different versions
of this header. __BYTE_ORDER__ has been defined since GCC 4.6, so there
is no need to rely on platform defaults or e.g. __MIPSEB__ to determine
endianness. A new common sub-header is added, but there should be no
changes to the visibility of these definitions.
Second, this eliminates the hand-rolled __bswapNN() routines, again in
favor of the compiler builtins. This was done already for x86 in
e6ff6154d2. The benefit here is that we no longer have to maintain our
own implementations on each arch, and can instead rely on the compiler
to emit appropriate instructions or libcalls, as available. This should
result in equivalent or better code generation. Notably 32-bit arm will
start using the `rev` instruction for these routines, which is available
on armv6+.
PR: 236920
Reviewed by: arichardson, imp
Tested by: bdragon (BE powerpc)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29012
(cherry picked from commit 720dc6bcb5)
The remote protocol allows for implementations to report more specific
reasons for the break in execution back to the client [1]. This is
entirely optional, so it is only implemented for amd64, arm64, and i386
at the moment.
[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Stop-Reply-Packets.html
Reviewed by: jhb
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 51
(cherry picked from commit 7446b0888d)
Implement wrappers around the existing debug_monitor interface, to be
consumed by MI kernel debugger code.
For now, the various db_printf() calls in this code remain. In the
future, they could be converted to printf() or removed altogether, to
properly decouple the DDB and GDB options.
Reviewed by: jhb, kib, markj
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
(cherry picked from commit 5a2933d0bf)
This imposes a fairly severe limitation on space available for mmap that
was not noticed prior to commit. Unfixed mmap will only map from
[data + MAXSIZE, end of user VA space], bringing the amount of usable space
down way too low for non-trivial link jobs (for instance).
Reported by: mmel
The fix in bd03acedb8 worked for 32-bit
ops, and for 64-bit ops for bit arguments of 0 - 95, but then was broken
for operations on the high 32 bits after that.
Reviewed by: markj, mmel
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D27897
An 8MB max stack size is quite limiting in today's world, and in-fact is
the *default* stack size for almost every other arch (including mips).
Raise the default to 4MB (should be pretty reasonable) and the max to 64MB.
NetBSD made a similar move back in 2015 and raised MAXDSIZ to 1856 at the
same time, so let's just roll that in as well. They later lowered it, but
eventually raised it back to 1856 in order to build rust.
This was noticed while looking at qemu-bsd-user's default stack sizes and
growth behavior (or lack thereof).
Reviewed by: ian
Differential Revision: https://reviews.freebsd.org/D27218
This can be handy if gdb's stack unwinder fails, for example because of
a bug in kgdb's trap frame unwinder.
PR: 251463
Submitted by: Dmitry Salychev <dsl@mcusim.org>
MFC after: 1 week
MPIDR represents physical locality of given core and it should be used as
the only viable/robust connection between cpuid (which have zero relation to
cores topology) and external description (for example in FDT). It can be
used for determining which interrupt is associated to given per-CPU PMU
or by scheduler for determining big/little core or cluster topology.
MFC after: 3 weeks
Modern ARM systems do not have an FPA unit but GDB reserves register
indices for FPA registers and expects the stub to know their sizes.
PR: 251022
Submitted by: Dmitry Salychev <dsl@mcusim.org>
MFC after: 2 weeks
The arm configs that required it have been removed from the tree.
Removing this option makes the callout code easier to read and
discourages developers from adding new configs without eventtimer
drivers.
Reviewed by: ian, imp, mav
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27270
in sync with (most) other architectures. No functional changes.
Reviewed by: manu
Tested by: mmel
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26604
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
Reviewed by: jhb
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26131
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.
Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.
This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25371
This allows privileged userspace processes to find information about the
physical page backing a given mapping. It is useful in applications
such as DPDK which perform some of their own memory management.
Reviewed by: kib, jhb (previous version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D26237
Big endian and armv4 mean that we are now down to only two supported
variants. A future change will use MACHINE_ARCH in assembly which
does not support C-style string concatentation and thus needs
MACHINE_ARCH defined as a single string.
Reviewed by: imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D25211
The trampoline code used for loading gzipped a.out kernels on arm was
removed in r350436. A portion of this code allowed for DDB to find the
symbol tables when booting without loader(8), and some of this was
untouched in the removal. Remove it now.
Differential Revision: https://reviews.freebsd.org/D24950
The arm_physmem interface found in arm's MD code provides a convenient
set of routines for adding/excluding physical memory regions and
initializing important kernel globals such as Maxmem, realmem,
phys_avail[], and dump_avail[]. It is especially convenient for FDT
systems, since we can use FDT parsing functions and pass the result
directly to one of these physmem routines. This interface is already in
use on arm and arm64, and can be used to simplify this early
initialization on RISC-V as well.
This requires only a couple trivial changes:
- Move arm_physmem_kernel_addr to arm/machdep.c. It is unused on arm64,
and manipulated entirely in arm MD code.
- Convert arm32_btop/arm64_btop to atop. This is equivalently defined
on all architectures.
- Drop the "arm" prefix.
Reviewed by: manu, emaste ("looks reasonable")
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24153
The goal of this change is to make the atomic_load_acq_{8,16},
atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives
available in MI-namespace.
The second goal is to get this draft out of my local tree, as anything that
requires a full tinderbox is a big burden out of tree. MD specifics can be
refined individually afterwards.
The generic implementations may not be ideal for your architecture; feel
free to implement better versions. If no subword_atomic definitions are
needed, the include can be removed from your arch's machine/atomic.h.
Generic definitions are guarded by defined macros of the same name. To
avoid picking up conflicting generic definitions, some macro defines are
added to various MD machine/atomic.h to register an existing implementation.
Include _atomic_subword.h in arm and arm64 machine/atomic.h.
For some odd reason, KCSAN only generates some versions of primitives.
Generate the _acq variants of atomic_load.*_8, atomic_load.*_16, and
atomic_testandset.*_long. There are other questionably disabled primitives,
but I didn't run into them, so I left them alone. KCSAN is only built for
amd64 in tinderbox for now.
Add atomic_subword implementations of atomic_load_acq_{8,16} implemented
using masking and atomic_load_acq_32.
Add generic atomic_subword implementations of atomic_testandset_long(),
atomic_testandclear_long(), and atomic_testandset_acq_long(), using
atomic_fcmpset_long() and atomic_fcmpset_acq_long().
On x86, add atomic_testandset_acq_long as an alias for
atomic_testandset_long.
Reviewed by: kevans, rlibby (previous versions both)
Differential Revision: https://reviews.freebsd.org/D22963
As defined in atomic(9) and implemented on other architectures, the
atomic(9) functions all act on unsigned pointers and types. Prior to this
revision, arm implemented some atomic(9) 'long' sized routines with correct
unsigned type, but others were incorrectly signed.
Reviewed by: tinderbox
Sponsored by: Dell EMC Isilon
This reverts r177661. The change is no longer very useful since
out-of-tree KLDs will be built to target SMP kernels anyway. Moveover
it breaks the KBI in !SMP builds since cpuset_t's layout depends on the
value of MAXCPU, and several kernel interfaces, notably
smp_rendezvous_cpus(), take a cpuset_t as a parameter.
PR: 243711
Reviewed by: jhb, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23512
The arm kernel stack unwinder has apparently never been able to unwind when
the path of execution leads through a kernel module. There was code that
tried to handle modules by looking for the unwind data in them, but it did
so by trying to find symbols which have never existed in arm kernel
modules. That caused the unwind code to panic, and because part of panic
handling calls into the unwind code, that just created a recursion loop.
Locating the unwind data in a loaded module requires accessing the Elf
section headers to find the SHT_ARM_EXIDX section. For preloaded modules
those headers are present in a metadata blob. For dynamically loaded
modules, the headers are present only while the loading is in progress; the
memory is freed once the module is ready to use. For that reason, there is
new code in kern/link_elf.c, wrapped in #ifdef __arm__, to extract the
unwind info while the headers are loaded. The values are saved into new
fields in the linker_file structure which are also conditional on __arm__.
In arm/unwind.c there is new code to locally cache the per-module info
needed to find the unwind tables. The local cache is crafted for lockless
read access, because the unwind code often needs to run in context where
sleeping is not allowed. A large comment block describes the local cache
list, so I won't repeat it all here.
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650