On all three platforms supported by vmm, we have mostly duplicated code
to manage guest physical memory regions. Deduplicate much of this code
and move it into sys/dev/vmm/vmm_mem.c.
To avoid exporting struct vm outside of machdep vmm.c, add a new
struct vm_mem to contain the memory segment descriptors, and add a
vm_mem() accessor, akin to vm_vmspace(). This way vmm_mem.c can
implement its routines without needing to see the layout of struct vm.
The handling of the per-VM vmspace is also duplicated but will be moved
to vmm_mem.c in a follow-up patch.
On amd64, move the ppt_is_mmio() check out of vm_mem_allocated() to keep
the code MI, as PPT is only implemented on amd64. There are only a
couple of callers, so this is not unreasonable.
No functional change intended.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D48270
It is no longer used anywhere (since 62508c531e), and not defined for
arm64 which also supports ACPI.
Reviewed by: imp, jhb
Differential Revision: https://reviews.freebsd.org/D48486
This flag was used as a transition for differing pcib implementations.
Today it is defined for all supported architectures, and can be removed.
Reviewed by: imp, jhb
Differential Revision: https://reviews.freebsd.org/D48485
These macros have substantially identical implementations on each
platform. Use roundup2/rounddown2 for round_page/trunc_page.
This version standardizes on not using explicit casts and instead
preserving the original type. A couple of tweaks were required to
make this work.
Reviewed by: brooks, kib, markj
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D48450
This macro has not been in use since commit "inline atomics and allow tied
modules to inline locks" (r335873, f4b3640475).
Reviewed by: markj, kib, emaste, imp
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48061
In order to match reality, allow using these functions with pointers on
const objects, and bring us closer to C11.
Remove the '+' modifier in the atomic_load_acq_64_i586()'s inline asm
statement's constraint for '*p' (the value to load). CMPXCHG8B always
writes back some value, even when the value exchange does not happen in
which case what was read is written back. atomic_load_acq_64_i586()
further takes care of the operation atomically writing back the same
value that was read in any case. All in all, this makes the inline
asm's write back undetectable by any other code, whether executing on
other CPUs or code on the same CPU before and after the call to
atomic_load_acq_64_i586(), except for the fact that CMPXCHG8B will
trigger a #GP(0) if the memory address is part of a read-only mapping.
This unfortunate property is however out of scope of the C abstract
machine, and in particular independent of whether the 'uint64_t' pointed
to is declared 'const' or not.
Approved by: markj (mentor)
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46887
Previously, VMXON would be executed on a resume, contrary to proper
initalization. The contents of MSR_IA32_FEATURE_CONTROL may be lost on
suspension, therefore must be restored. Likewise, the VMX Enable bit may be
cleared upon suspend, requiring it to be re-set.
Concretely disable VMX on suspend, and re-enable it on resume.
Note: any IOMMU context will remain lost for any enabled vmm devices.
Signed-off-by: Joshua Rogers <Joshua@Joshua.Hu>
Reviewed by: jhb,imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1419
Following arm64 and risc-v, move definitions that describe
hardware-enforced layout of PTEs and #PF error bits, into a dedicated
header.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D47749
For current architectures, these are just aliases for the existing
operation on the relevant scalar integer.
Reviewed by: imp, kib
Obtained from: CheriBSD
Sponsored by: AFRL, DARPA
Differential Revision: https://reviews.freebsd.org/D47631
vmm.h is required for VM_MAX_SUFFIXLEN. vmm_snapshot.h is required for
struct vm_snapshot_meta.
This is a prerequisite for including vmm_dev.h in the headers parsed by
libsysdecode.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D46485
There is no reason to keep them in vmm_dev.h. No functional change
intended.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46432
This file contains the vmm device file implementation. Most of this
code is not machine-dependent and so shouldn't be duplicated this way.
Move most of it into a generic dev/vmm/vmm_dev.c. This will make it
easier to introduce a cdev-based interface for VM creation, which in
turn makes it possible to implement support for running bhyve as an
unprivileged user.
Machine-dependent ioctls continue to be handled in machine-dependent
code. To make the split a bit easier to handle, introduce a pair of
tables which define MI and MD ioctls. Each table entry can set flags
which determine which locks need to be held in order to execute the
handler. vmmdev_ioctl() now looks up the ioctl in one of the tables,
acquires locks and either handles the ioctl directly or calls
vmmdev_machdep_ioctl() to handle it.
No functional change intended. There is a lot of churn in this change
but the underlying logic in the ioctl handlers is the same. For now,
vmm_dev.h is still mostly separate, even though some parts could be
merged in principle. This would involve changing include paths for
userspace, though.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46431
All architectures enable NEW_PCIB in DEFAULTS (arm being the most recent
to do so in 121be55599 (arm: Set NEW_PCIB in DEFAULTS rather than a
subset of kernel configs")), so it's time we removed the legacy code
that no longer sees much testing and has a significant maintenance
burden.
Reviewed by: jhb, andrew, emaste
Differential Revision: https://reviews.freebsd.org/D32954
These changes mostly apply to the !__SEG_GS section, which is no longer
the normal compilation path. They're made to be consistent with changes
to i386.
- Add missing cc clobber to __PCPU_ADD (which is currently unused).
- Allow the compiler the opportunity to marginally improve code
generation from __PCPU_PTR by letting it figure out how to do the add
(also removing the addition fixes a missing cc clobber).
- Quiet gcc -Warray-bounds by using constant operands instead of bogus
memory references.
- Remove the struct __s __s temporaries, just cast through the type.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45827
Use a constant input operand instead of an output operand to tell the
compiler about OFFSETOF_MONITORBUF. If we tell it we are writing to
*(u_int *)OFFSETOF_MONITORBUF, it rightly complains, but we aren't. The
memory clobber already covers the necessary semantics for the compiler.
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D45694
The idea here is to avoid a memory access and conditional branch per
probe site. Instead, the probe is represented by an "unreachable"
unconditional function call. asm goto is used to store the address of
the probe site (represented by a no-op sled) and the address of the
function call into a tracepoint record. Each SDT probe carries a list
of tracepoints.
When the probe is enabled, the no-op sled corresponding to each
tracepoint is overwritten with a jmp to the corresponding label. The
implementation uses smp_rendezvous() to park all other CPUs while the
instruction is being overwritten, as this can't be done atomically in
general. The compiler moves argument marshalling code and the
sdt_probe() function call out-of-line, i.e., to the end of the function.
Per gallatin@ in D43504, this approach has less overhead when probes are
disabled. To make the implementation a bit simpler, I removed support
for probes with 7 arguments; nothing makes use of this except a
regression test case. It could be re-added later if need be.
The approach taken in this patch enables some more improvements:
1. We can now automatically fill out the "function" field of SDT probe
names. The SDT macros let the programmer specify the function and
module names, but this is really a bug and shouldn't have been
allowed. The intent was to be able to have the same probe in
multiple functions and to let the user restrict which probes actually
get enabled by specifying a function name or glob.
2. We can avoid branching on SDT_PROBES_ENABLED() by adding the ability
to include blocks of code in the out-of-line path. For example:
if (SDT_PROBES_ENABLED()) {
int reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
SDT_PROBE1(proc, , , exit, reason);
}
could be written
SDT_PROBE1_EXT(proc, , , exit, reason,
int reason;
reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
);
In the future I would like to use this mechanism more generally, e.g.,
to remove branches and marshalling code used by hwpmc, and generally to
make it easier to add new tracepoint consumers without having to add
more conditional branches to hot code paths.
Reviewed by: Domagoj Stolfa, avg
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D44483
FreeBSD's boot times have decreased to the point where vm_page array
initialization represents a significant fraction of the total boot time.
For example, when booting FreeBSD in Firecracker (a VMM designed to
support lightweight VMs) with 128MB and 1GB of RAM, vm_page
initialization consumes 9% (3ms) and 37% (21.5ms) of the kernel boot
time, respectively. This is generally relevant in cloud environments,
where one wants to be able to spin up VMs as quickly as possible.
This patch implements lazy initialization of (most) page structures,
following a suggestion from cperciva@. The idea is to introduce a new
free pool, VM_FREEPOOL_LAZYINIT, into which all vm_page structures are
initially placed. For this to work, we need only initialize the first
free page of each chunk placed into the buddy allocator. Then, early
page allocations draw from the lazy init pool and initialize vm_page
chunks (up to 16MB, 4096 pages) on demand. Once APs are started, an
idle-priority thread drains the lazy init pool in the background to
avoid introducing extra latency in the allocator. With this scheme,
almost all of the initialization work is moved out of the critical path.
A couple of vm_phys operations require the pool to be drained before
they can run: vm_phys_find_range() and vm_phys_unfree_page(). However,
these are rare operations. I believe that
vm_phys_find_freelist_contig() does not require any special treatment,
as it only ever accesses the first page in a power-of-2-sized free page
chunk, which is always initialized.
For now the new pool is only used on amd64 and arm64, since that's where
I can easily test and those platforms would get the most benefit.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D40403
The kernel source contains several definitions of an ilog2 function;
some are slower than necessary, and one of them is incorrect.
Elimininate them all and define an ilog2 macro in libkern to replace
them, in a way that is fast, correct for all argument types, and, in a
GENERIC kernel, includes a check for an invalid zero parameter.
Folks at Microsoft have verified that having a correct ilog2
definition for their MANA driver doesn't break it.
Reviewed by: alc, markj, mhorne (older version), jhibbits (older version)
Differential Revision: https://reviews.freebsd.org/D45170
Differential Revision: https://reviews.freebsd.org/D45235
This commit introduces the MINIDUMP_STARTUP_PAGE_TRACKING symbol and
uses it to simplify several instances of a complex preprocessor conditional
for adding pages allocated when bootstraping the kernel to minidumps.
Reviewed by: markj, mhorne
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45085
This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.
The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.
Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45084
In a follow-up revision the gdb stub will support sending an XML target
description to gdb, which lets us send additional registers, including
the ones added in this patch.
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D43665
The return values of copyin() and copyout() must be checked.
vm_snapshot_buf_cmp() is unused by the kernel and was incorrectly
implemented, so just remove it.
Reviewed by: markj
Sponsored by: vStack
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D43754
pmap_san_bootstrap() doesn't really do much, and it was hard-coding the
the bootstrap stack size defined in locore.S. Moreover, the name is a
bit confusing given the existence of pmap_bootstrap_san(). Just remove
it and call kasan_init_early() directly like we do on amd64. It will
not be used by KMSAN in a forthcoming patch series.
No functional change intended.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43403
These are in fact GPLv2 when distributed with the Linux kernel, but the
license also allows MIT if distributed separately. Add the markers to
avoid interference by automated tools.
Differential Revision: https://reviews.freebsd.org/D32796
Reviewed by: royger
This patch implements single-stepping for AMD CPUs using the RFLAGS.TF
single-stepping mechanism. The GDB stub requests single-stepping
using the VM_CAP_RFLAGS_TF capability. Setting this capability will
set the RFLAGS.TF bit on the selected vCPU, activate DB exception
intercepts, and activate POPF/PUSH instruction intercepts. The
resulting DB exception is then caught by the IDT_DB vmexit handler and
bounced to userland where it is processed by the GDB stub. This patch
also makes sure that the value of the TF bit is correctly updated and
that it is not erroneously propagated into memory. Stepping over PUSHF
will cause the vm_handle_db function to correct the pushed RFLAGS
value and stepping over POPF will update the shadowed TF bit copy.
Reviewed by: jhb
Sponsored by: Google, Inc. (GSoC 2022)
Differential Revision: https://reviews.freebsd.org/D42296
A precursor to merging them. The spacing differs quite a bit between
the i386 and amd64 hypercall headers, despite very similar content.
Consistently use tabs instead of spaces.
Reviewed by: royger
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
All of these used the 'immediately at beginning' variation of the
BSD-2-Clause license. This wasn't intentional, just what I copied from
from a random file in the tree back in 2005. It was not an intentional
decision.
The different arch bus.h files are a mix of BSD-2-Clause and
BSD-4-Clause that have various copyright holders (Charles M. Hannum,
Christopher G. Demetriou, The NetBSD Foundation and KATO Takenori), and
some of the content of these files were likely copied from there.
However, apart from the uncopyrightable interface lines, there are very
few comments. It's unclear if these comments are 'original material'
here to copyright, but to the extent that there is, license it under the
standard BSD-2-Clause copyright that's the norm for the project today.
In any event, the standard BSD-2-Clause is also closer to those
originals.
In addition, FreeBSD uses different type definitions than the original
NetBSD code in part. The comments that were copied have been copied a
lot, but appear in NetBSD's bus.h files in NetBSD 1.3.
While I'm here, assign the copyright, to the extent any exists from me,
to the FreeBSD Foundation. I just cut and pasted these into _bus.h from
the different machine files and those files have a rich history of
modification from the original imports from NetBSD over more than 25
years so it's tricky to say who, exactly, wrote each bit. Given the size
of the files, this seems like the best compromise. Also add an
acknowledgement to the NetBSD 1.3 bus.h files and their authors (there
were no additional FreeBSD authors listed in the various
sys/*/include/bus.h files). Finally, use the SPDX identifier instead of
multiple copies of the text.
Differential Revision: https://reviews.freebsd.org/D42532
Sponsored by: Netflix
Hardware with more than 256 CPU cores is currently available and will
become increasingly common over FreeBSD 14's lifetime. Increase MAXCPU
in the amd64 GENERIC kernel configuration to 1024.
Earlier commits increased some related limits. These prerequisite
commits include at least:
- d7ed40243769 Increase MAX_APIC_ID safeguard to 0x800
- d1639e43c5 cpuset: increase userland maximum size to 1024
Global and allocated arrays sized by MAXCPU result in excessive bloat
on systems with lower core counts. In addition, some code used u_char
(8 bits) to hold a CPU index, which is not valid if MAXCPU is greater
than 256.
A number of recent commits addressed these sorts of issues, including
at least:
- 133935d26f pf: atomically increment state ids
- 74ac712f72 vmm: Dynamically allocate a couple of per-CPU state save areas
- 78cfa762eb callout: Move per-CPU callout state into the dpcpu region
- 42f722e721 amd64: store pcids pmap data in pcpu zone
- 9801e7c275 smp_topo: dynamically allocate group array
- 9fb6718d1b smp: Dynamically allocate the stoppcbs array
- 2bb16c6352 x86: retire use of intr_bind
There are some additional allocations still to be converted and
more scalability work is required to make effective use of very high
core count systems, but this change allows us to boot on these systems
and provides a Kernel Binary Interface (KBI) for the FreeBSD 14 release
that supports these configurations.
Special thanks to AMD for providing hardware to test these changes.
PR: 269572
Reviewed by: des
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36838