The headers were mostly identical on amd64 and i386.
No functional change intended.
Reviewed by: cperciva, mav, imp, kib, jhb
Sponsored by: The FreeBSD Foundation
(cherry picked from commit f06f1d1fdb)
We use pmap_invalidate_cpu_mask() to get the set of active CPUs. This
(32-byte) set is copied by value through multiple frames until we get to
smp_targeted_tlb_shootdown(), where it is copied yet again.
Avoid this copying by having smp_targeted_tlb_shootdown() make a local
copy of the active CPUs for the pmap, and drop the cpuset parameter,
simplifying callers. Also leverage the use of the non-destructive
CPU_FOREACH_ISSET to avoid unneeded copying within
smp_targeted_tlb_shootdown().
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
(cherry picked from commit ab12e8db29)
It is useful for quickly checking an address against the DMAP region.
These definitions exist already on arm64 and riscv.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D32962
(cherry picked from commit 90d4da6225)
It was never implemented on powerpc or riscv and appears to have been
unused since it was added in 1998. No functional change intended.
Reviewed by: kp, glebius, cy
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 09100f936b)
This is in preference to simply filling the cpuset, and allows the
conditional in pmap_invalidate_cpu_mask() to be elided.
Also export pmap_invalidate_cpu_mask() outside of pmap.c for use in a
subsequent commit.
Suggested by: kib
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 71e6e9da22)
KMSAN requires two shadow maps, each one-to-one with the kernel map.
Allocate regions of the kernels PML4 page for them. Add functions to
create mappings in the shadow map regions, these will be used by the
KMSAN runtime.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit f95f780ea4)
Make it easy to define interceptors for new sanitizer runtimes, rather
than assuming KCSAN. Lay a bit of groundwork for KASAN and KMSAN.
When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions
in atomic_san.h are used by default instead of the inline
implementations in the platform's atomic.h. These definitions are
implemented in the sanitizer runtime, which includes
machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual
implementations.
No functional change intended.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 3ead60236f)
- Initialize KASAN before executing SYSINITs.
- Add a GENERIC-KASAN kernel config, akin to GENERIC-KCSAN.
- Increase the kernel stack size if KASAN is enabled. Some of the
ASAN instrumentation increases stack usage and it's enough to
trigger stack overflows in ZFS.
- Mark the trapframe as valid in interrupt handlers if it is
assigned to td_intr_frame. Otherwise, an interrupt in a function
which creates a poisoned alloca region can trigger false positives.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit f115c06121)
The idea behind KASAN is to use a region of memory to track the validity
of buffers in the kernel map. This region is the shadow map. The
compiler inserts calls to the KASAN runtime for every emitted load
and store, and the runtime uses the shadow map to decide whether the
access is valid. Various kernel allocators call kasan_mark() to update
the shadow map.
Since the shadow map tracks only accesses to the kernel map, accesses to
other kernel maps are not validated by KASAN. UMA_MD_SMALL_ALLOC is
disabled when KASAN is configured to reduce usage of the direct map.
Currently we have no mechanism to completely eliminate uses of the
direct map, so KASAN's coverage is not comprehensive.
The shadow map uses one byte per eight bytes in the kernel map. In
pmap_bootstrap() we create an initial set of page tables for the kernel
and preloaded data.
When pmap_growkernel() is called, we call kasan_shadow_map() to extend
the shadow map. kasan_shadow_map() uses pmap_kasan_enter() to allocate
memory for the shadow region and map it.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29417
(cherry picked from commit 6faf45b34b)
MFC note: this commit changes layout of td_md for amd64, resulting in
static checks for struct thread ABI in kern_thread.c to fail. Next
two commits restore the layout, I decided to not overcomplicate the
merge and not do the work that is going to be overwritten immediately.
(cherry picked from commit df8dd6025a)
This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters. Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.
The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios. However, this change does add two new counters that
are available without PV_STATS. pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively. These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.
Reviewed by: kib
(cherry picked from commit e4b8deb222)
This is intended to be used with memory mapped IO, e.g. from
bus_space_map with no flags, or pmap_mapdev.
Use this new memory type in the map request configured by
resource_init_map_request, and in pciconf.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29692
(cherry picked from commit 5d2d599d3f)
The remote protocol allows for implementations to report more specific
reasons for the break in execution back to the client [1]. This is
entirely optional, so it is only implemented for amd64, arm64, and i386
at the moment.
[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Stop-Reply-Packets.html
Reviewed by: jhb
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 51
(cherry picked from commit 7446b0888d)
Add wrappers around the dbreg interface that can be consumed by MI
kernel debugger code. The dbreg functions themselves are updated to
return error codes, not just -1. dbreg_set_watchpoint() is extended to
accept access bits as an argument.
Reviewed by: jhb, kib, markj
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
(cherry picked from commit 15dc1d4452)
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.
We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.
[1]: https://github.com/freebsd/uefi-edk2/pull/9/
Originally by: scottph
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066
(cherry picked from commit f8a6ec2d57)
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.
No functional change intended.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29103
(cherry picked from commit 435c7cfb24)
All page zeroing is using temporal stores with rep movs*, the routine is
unused for several years.
Should a need arise for zeroing using non-temporal stores, a more
optimized variant can be implemented with a more descriptive name.
(cherry picked from commit d1de5698df)
Use an interface compatible with the Linux one so that the user-space
libraries already using the Linux interface can be used without much
modifications.
This allows user-space to make use of the dm_op family of hypercalls,
which are used by device models.
Sponsored by: Citrix Systems R&D
Similar to the recent patch to arm's gdb stub in r368414, allow GDB to
update the contents of most general purpose registers.
Reviewed by: cem, jhb, markj
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 44
Differential Revision: https://reviews.freebsd.org/D27642
There is no need for these to be function pointers since they are
never modified post-module load.
Rename AMD/Intel ops to be more consistent.
Submitted by: adam_fenn.io
Reviewed by: markj, grehan
Approved by: grehan (bhyve)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27375
i386 and the rest of supported architectures by defining KERNLOAD in the
vmparam.h and getting rid of magic constant in the linker script, which albeit
documented via comment but isn't programmatically accessible at a compile time.
Use KERNLOAD to eliminate another (matching) magic constant 100 lines down
inside unremarkable TU "copy.c" 3 levels deep in the EFI loader tree.
Reviewed by: markj
Approved by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D27355
Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X. This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).
This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.
While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: grehan, markj
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27212
Currently EPT TLB invalidation is done by incrementing a generation
counter and issuing an IPI to all CPUs currently running vCPU threads.
The VMM inner loop caches the most recently observed generation on each
host CPU and invalidates TLB entries before executing the VM if the
cached generation number is not the most recent value.
pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing
guest instructions and reload the generation number. However, it does
not actually wait for vCPUs to exit, potentially creating a window where
guests may continue to reference stale TLB entries.
Fix the problem by bracketing guest execution with an SMR read section
which is entered before loading the invalidation generation. Then,
pmap_invalidate_ept() increments the current write sequence before
loading pm_active and sending IPIs, and polls readers to ensure that all
vCPUs potentially operating with stale TLB entries have exited before
pmap_invalidate_ept() returns.
Also ensure that unsynchronized loads of the generation counter are
wrapped with atomic(9), and stop (inconsistently) updating the
invalidation counter and pm_active bitmask with acquire semantics.
Reviewed by: grehan, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26910
The amd64 kernel handles certain types of exceptions on a dedicated
stack. Currently the sizes of these stacks are all hard-coded to
PAGE_SIZE, but for at least NMI handling it can be useful to use larger
stacks. Add constants to intr_machdep.h to make this easier to tweak.
No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27076
on some more advanced C features.
This fixes gcc-toolchain build of exception.S.
Reported and tested by: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
RIght now PCB_KERNFPU is used both as indication that kernel prepared
hardware FPU context to use and that the thread is fpu-kern
thread. This also breaks fpu_kern_enter(FPU_KERN_NOCTX), since
fpu_kern_leave() then clears PCB_KERNFPU.
Introduce new flag PCB_KERNFPU_THR which indicates that the thread is
fpu-kern. Do not clear PCB_KERNFPU if fpu-kern thread leaves noctx
fpu region.
Reported and tested by: jhb (amd64)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25511
If current process is 64bit, use rex-prefixed version of XSAVE
(XSAVE64). If current process is 32bit and CPU supports saving
segment registers cs/ds in the FPU save area, use non-prefixed variant
of XSAVE.
Reported and tested by: Michał Górny <mgorny@mgorny@moritz.systems>
PR: 250043
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26643
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
Reviewed by: jhb
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26131
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129