Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
(cherry picked from commit b792434150)
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
(cherry picked from commit 254e4e5b77)
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
For stable/13 only, sys/arm/include/tls.h includes support for
ARM_TP_ADDRESS which is not present in main.
(cherry picked from commit 1a62e9bc00)
Since physical memory management is now handled by subr_physmem.c, the
need to keep this global array has diminished. It is not referenced
outside of early boot-time, and is populated by physmem_avail() in
pmap_bootstrap(). Just allocate the array on the stack for the duration
of its lifetime.
The check against physmap[0] in initriscv() can be dropped altogether,
as there is no consequence for excluding a memory range twice.
Reviewed by: markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34778
(cherry picked from commit 8a0339e679)
This increases the size of the user map from 256GB to 128TB. The kernel
map is left unchanged for now.
For now SV48 mode is left disabled by default, but can be enabled with a
tunable. Note that extant hardware does not implement SV48, but QEMU
does.
- In pmap_bootstrap(), allocate a L0 page and attempt to enable SV48
mode. If the write to SATP doesn't take, the kernel continues to run
in SV39 mode.
- Define VM_MAX_USER_ADDRESS to refer to the SV48 limit. In SV39 mode,
the region [VM_MAX_USER_ADDRESS_SV39, VM_MAX_USER_ADDRESS_SV48] is not
mappable.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 31218f3209)
Instead of having the one-off load_satp(), just use csr_write(). No
functional change intended.
Reviewed by: alc, jhb
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 2e956c30ca)
In SV48 mode, the top-level page will be an L0 page rather than an L1
page. Rename the field accordingly. No functional change intended.
Reviewed by: alc, jhb
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 82f4e0d0f0)
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 9feff969a0)
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.
Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.
PR: 259157
Reviewed by: mav, kib, markj (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32580
(cherry picked from commit 0d2224733e)
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
(cherry picked from commit 1adebe3cd6)
This is needed for LinuxKPI's _ioremap_attr. This reuses the generic
implementation introduced for aarch64, and itself requires implementing
pmap_kenter, which is trivial to do given riscv currently treats all
mapping attributes the same due to the Svpbmt extension not yet being
ratified and in hardware.
Reviewed by: markj, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32445
(cherry picked from commit 682c00a6ce)
When handling a kernel page fault, check explicitly that stval resides
in either the user or kernel address spaces, and make the page fault
fatal if not. Otherwise, a properly crafted address may appear to
pmap_fault() as a valid and present page in the kernel map, causing the
page fault to be retried continuously. This is mainly due to the fact
that the upper bits of virtual addresses are not validated by most of
the pmap code.
Faults of this nature should only occur due to some kind of bug in the
kernel, but it is best to handle them gracefully when they do.
Handle user page faults in the same way, sending a SIGSEGV immediately
when a malformed address is encountered.
Add an assertion to pmap_l1(), which should help catch other bugs of
this kind that make it this far.
Reviewed by: jrtc27, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31208
(cherry picked from commit 4a9f2f8b07)
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
(cherry picked from commit ab4ed843a3)
pmap_change_attr is required by drm-kmod so we need the function to
exist. Since the Svpbmt extension is on the horizon we will likely end
up with a real implementation of it, so this stub implementation does
all the necessary page table walking to validate the input, ensuring
that no new errors are returned once it's implemented fully (other than
due to out of memory conditions when demoting L2 entries) and providing
a skeleton for that future implementation.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31996
(cherry picked from commit 1be2e16df1)
This change serves two purposes.
First, we take advantage of the compiler provided endian definitions to
eliminate some long-standing duplication between the different versions
of this header. __BYTE_ORDER__ has been defined since GCC 4.6, so there
is no need to rely on platform defaults or e.g. __MIPSEB__ to determine
endianness. A new common sub-header is added, but there should be no
changes to the visibility of these definitions.
Second, this eliminates the hand-rolled __bswapNN() routines, again in
favor of the compiler builtins. This was done already for x86 in
e6ff6154d2. The benefit here is that we no longer have to maintain our
own implementations on each arch, and can instead rely on the compiler
to emit appropriate instructions or libcalls, as available. This should
result in equivalent or better code generation. Notably 32-bit arm will
start using the `rev` instruction for these routines, which is available
on armv6+.
PR: 236920
Reviewed by: arichardson, imp
Tested by: bdragon (BE powerpc)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29012
(cherry picked from commit 720dc6bcb5)
This is consistent with other platforms, specifically arm and arm64. No
functional change intended.
Reviewed by: jrtc27
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 317113bb12)
This basically mirrors what already exists in ddb, but provides a
slightly improved interface. It allows the caller to specify the
watchpoint access type, and returns more specific error codes to
differentiate failure cases.
This will be used to support hardware watchpoints in gdb(4).
Stubs are provided for architectures lacking hardware watchpoint logic
(mips, powerpc, riscv), while other architectures are added individually
in follow-up commits.
Reviewed by: jhb, kib, markj
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
(cherry picked from commit 763107f26c)
This appears to be a copy-and-paste error that has simply been
overlooked. The tree contains only two calls to any of the affected
variants, but recent additions to the test suite started exercising the
call to atomic_clear_rel_int() in ng_leave_write(), reliably causing
panics.
Apparently, the issue was inherited from the arm64 atomic header. That
instance was addressed in c90baf6817, but the fix did not make its way
to RISC-V.
Note that the particular test case ng_macfilter_test:main still appears
to fail on this platform, but this change reduces the panic to a
timeout.
PR: 253237
Reported by: Jenkins, arichardson
Reviewed by: kp, arichardson
(cherry picked from commit 0d3b3beeb2)
This setting limits the amount of memory that can be allocated to UMA.
On systems with a direct map and ample KVA, however, there is no reason
for VM_KMEM_SIZE_SCALE to be larger than 1. This appears to have been
inherited from the 32-bit ARM platform definitions.
Also remove VM_KMEM_SIZE_MIN, which is not needed when
VM_KMEM_SIZE_SCALE is defined to be 1.[*]
Reviewed by: alc, kp, kib
Reported by: alc [*]
Submitted by: Klara, Inc.
Sponsored by: Ampere Computing
Differential Revision: https://reviews.freebsd.org/D28225
The purpose of this KASSERT is to ensure that we do not run out of space
in the early devmap. However, the devmap grew beyond its initial size of
2MB in r336519, and this assertion did not grow with it.
A devmap mapping of a 1080p framebuffer requires 1920x1080 bytes, or
1.977 MB, so it is just barely able to fit without triggering the
assertion, provided no other devices are mapped before it. With the
addition of `options GDB` in GENERIC by bbfa199cbc, the uart is now
mapped for the purposes of a debug port, before mapping the framebuffer.
The presence of both these conditions pushes the selected virtual
address just below the threshold, triggering the assertion.
To fix this, use the correct size of the devmap, defined by
PMAP_MAPDEV_EARLY_SIZE. Since this code is shared with RISC-V, define
it for that platform as well (although it is a different size).
PR: 25241
Reported by: gbe
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
These implementation IDs are defined in the SBI spec, so we should print
their name if detected.
Submitted by: Danjel Qyteza <danq1222@gmail.com>
Reviewed by: jhb, kp
Differential Revision: https://reviews.freebsd.org/D27660
Prefer atomics to critical section. This reduces the cost of the
increment operation and removes the possibility of it being interrupted
by counter_u64_zero().
Use CPU_FOREACH() macro to skip absent CPUs.
Replace hand-rolled address calculation with zpcpu_get().
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27536
- Push the kstack_contains check down into unwind_frame() so that it
is honored by DDB and DTrace.
- Check that the trapframe for an exception frame is contained in the
traced thread's kernel stack for DDB traces.
Reviewed by: markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27357
This allows GDB to print more useful backtraces when setting a breakpoint
on an assembly function.
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D27177
Version 0.2 of the SBI specification [1] marked the existing SBI
functions as "legacy" in order to move to a newer calling convention. It
also introduced a set of replacement extensions for some of the legacy
functionality. In particular, the TIME, IPI, and RFENCE extensions
implement and extend the semantics of their legacy counterparts, while
conforming to the newer version of the spec.
Update our SBI code to use the new replacement extensions when
available, and fall back to the legacy ones. These will eventually be
dropped, when support for version 0.2 is ubiquitous.
[1] https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc
Submitted by: Danjel Q. <danq1222@gmail.com>
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D26953
S-mode software has write access to the SIP.SSIP bit, so instead of
making a second round-trip through the SBI we can clear it ourselves.
The SBI spec has deprecated this function for this exactly this reason.
Submitted by: Danjel Q. <danq1222@gmail.com
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D26952
The existing names were inherited from arm64, but we should prefer
RISC-V terminology. Change the prefix to SCAUSE, and further change the
names to better match the RISC-V spec and be more consistent with one
another. Also, remove two codes that are not defined for S-mode (machine
and hypervisor ecall).
While here, apply style(9) to some condition checks.
Reviewed by: kp
Discussed with: jrtc27
Differential Revision: https://reviews.freebsd.org/D26918
Every other architecture defines this and this is required for
interrupts to work when using QEMU's PCI VirtIO devices (which all
report an interrupt line of 0) for two reasons.
Firstly, interrupt line 0 is wrong; they use one of 0x20-0x23 with the
lines being cycled across devices like normal. Moreover, RISC-V uses
INTRNG, whose IRQs are virtual as indices into its irq_map, so even if
we have the right interrupt line we still need to try and route the
interrupt in order to ultimately call into intr_map_irq and get back a
unique index into the map for the given line, otherwise we will use
whatever happens to be in irq_map[line] (which for QEMU where the line
is initialised to 0 results in using the first allocated interrupt,
namely the RTC on IRQ 11 at time of commit).
Note that pci_assign_interrupt will still do the wrong thing for INTRNG
when using a tunable, as it will bypass INTRNG entirely and use the
tunable's value as the index into irq_map, when it should instead
(indirectly) call intr_map_irq to allocate a new entry for the given
IRQ and treat the tunable as stating the physical line in use, which is
what one would expect. This, however, is a problem shared by all INTRNG
architectures, and not exclusive to RISC-V.
Reviewed by: kib
Approved by: kib
Differential Revision: https://reviews.freebsd.org/D26564
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
Reviewed by: jhb
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26131
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.
Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.
This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25371
This allows privileged userspace processes to find information about the
physical page backing a given mapping. It is useful in applications
such as DPDK which perform some of their own memory management.
Reviewed by: kib, jhb (previous version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D26237
There was an additional 7 bytes of compiler-inserted padding at the
end of the structure visible via 'ptype /o' in gdb.
Reviewed by: mhorne
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D25867
RISC-V doesn't support floating-point exceptions.
RISC-V Instruction Set Manual: Volume I: User-Level ISA, 11.2 Floating-Point
Control and Status Register: "As allowed by the standard, we do not support
traps on floating-point exceptions in the base ISA, but instead require
explicit checks of the flags in software. We considered adding branches
controlled directly by the contents of the floating-point accrued exception
flags, but ultimately chose to omit these instructions to keep the ISA simple."
We still need these functions, because some applications (notably Perl) call
them, but we cannot provide a meaningful implementation.
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D25740
This implementation doesn't have any major deviations from the other EFI
ports. I've copied the boilerplate from arm and arm64.
I've tested this with the following boot flows:
OpenSBI (M-mode) -> u-boot (S-mode) -> loader.efi -> FreeBSD
OpenSBI (M-mode) -> u-boot (S-mode) -> boot1.efi -> loader.efi -> FreeBSD
Due to the way that u-boot handles secondary CPUs, OpenSBI >= v0.7 is required,
as the HSM extension is needed to bring them up explicitly. Because of this,
using BBL as the SBI implementation will not be possible. Additionally, there
are a few recent u-boot changes that are required as well, all of which will be
present in the upcoming v2020.07 release.
Looks good: emaste
Differential Revision: https://reviews.freebsd.org/D25135