This file contains the vmm device file implementation. Most of this
code is not machine-dependent and so shouldn't be duplicated this way.
Move most of it into a generic dev/vmm/vmm_dev.c. This will make it
easier to introduce a cdev-based interface for VM creation, which in
turn makes it possible to implement support for running bhyve as an
unprivileged user.
Machine-dependent ioctls continue to be handled in machine-dependent
code. To make the split a bit easier to handle, introduce a pair of
tables which define MI and MD ioctls. Each table entry can set flags
which determine which locks need to be held in order to execute the
handler. vmmdev_ioctl() now looks up the ioctl in one of the tables,
acquires locks and either handles the ioctl directly or calls
vmmdev_machdep_ioctl() to handle it.
No functional change intended. There is a lot of churn in this change
but the underlying logic in the ioctl handlers is the same. For now,
vmm_dev.h is still mostly separate, even though some parts could be
merged in principle. This would involve changing include paths for
userspace, though.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46431
There is a small difference between the arm64 and amd64 implementations:
the latter makes use of a "scope" to exclude AMD-specific stats on Intel
systems and vice-versa. Replace this with a more generic predicate
callback which can be used for the same purpose.
No functional change intended.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46430
These just need to include the common code with macros to ensure it is
built correctly.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D46083
We can share some of the vmm code between VHE and non-VHE modes. To
support this create new files that include the common code and create
macros to name what will be the common functions.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D46072
To simplify disabling the kernel sanitizers in some files add
NOSAN_CFLAGS and NOSAN_C variables. These are CFLAGS and NORMAL_C with
the sanitizer flags removed.
While here add MSAN_CFLAGS to simplify keeping KMSAN in kern_kcov.c
Reviewed by: khng, brooks, imp, markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45498
A new instance of using ld with -T to bring in the kernel ld script
crept into the tree after I originally did the refactoring. It too needs
-L ${SYSDIR}/conf added.
Fixes: 37d6d682af
Sponsored by: Netflix
These are only included in the amd64 vmm code, so it doesn't make sense
to list them unconditionally.
PR: 280171
Reviewed by: wosch, imp, emaste
Differential Revision: https://reviews.freebsd.org/D45964
This code runs at EL2 while the kernel runs at EL1. We build these
files for EL2 through a dependency in vmm_hyp_blob.elf.full so there
is no need to include them in SRCS.
Reviewed by: imp, kib, markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45467
LLD has the -zbti-report=error argument to check if the BTI note is
present when linking. To allow for this to be used when linking the
kernel and modules:
- Add the BTI note to the remaining assembly files
- Mark ptrauth.c as protected by BTI
- Disable -zbti-report for vmm hypervisor switching code as it's not
used there.
The linux64 module doesn't build with the flag as it includes vdso code
that doesn't include the note.
Reviewed by: imp, kib, emaste
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45466
To silence a linker warning about _start being missing. This blob
contains code executed at EL2 and is only meant to be entered via
exception handlers.
Reviewed by: bz, emaste
Fixes: 47e073941f ("Import the kernel parts of bhyve/arm64")
Differential Revision: https://reviews.freebsd.org/D44735
To support virtual machines on arm64 add the vmm code. This is based on
earlier work by Mihai Carabas and Alexandru Elisei at University
Politehnica of Bucharest, with further work by myself and Mark Johnston.
All AArch64 CPUs should work, however only the GICv3 interrupt
controller is supported. There is initial support to allow the GICv2
to be supported in the future. Only pure Armv8.0 virtualisation is
supported, the Virtualization Host Extensions are not currently used.
With a separate userspace patch and U-Boot port FreeBSD guests are able
to boot to multiuser mode, and the hypervisor can be tested with the
kvm unit tests. Linux partially boots, but hangs before entering
userspace. Other operating systems are untested.
Sponsored by: Arm Ltd
Sponsored by: Innovate UK
Sponsored by: The FreeBSD Foundation
Sponsored by: University Politehnica of Bucharest
Differential Revision: https://reviews.freebsd.org/D37428
Some, particularly KASAN, may insert redzones around global symbols,
resulting in incorrect offset definitions because genassym.sh (ab)uses
symbol sizes to assign semantic meaning.
(Ideally I would be able to define this pattern in one place, but I
haven't found a way to define a GENSYM_CFLAGS that actually works for
all of the consumers (kern.post.mk, kmod.mk, sys/conf/files*).)
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
For the same reason as the original https://reviews.freebsd.org/D9659:
-flto=<N>, -flto=full, and -flto=thin also produce the GIMPLE/bitcode
which is not supported by genassym, so filter those out as well.
Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/898
kmod.mk appends the value of SRCS.${KERN_OPT} for each defined kernel
option to SRCS. This helper is shorter than appending to SRCS under
explicit checks on KERN_OPTS.
Reviewed by: imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D38738
Currently, AMD-vi PCI-e passthrough will lead to the following lines in
dmesg:
"kernel: CPU0: local APIC error 0x40
ivhd0: Error: completion failed tail:0x720, head:0x0."
After some tracing, the problem is due to the interaction with
amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the
identification of AMD-vi IVHD is done by walking over the ACPI IVRS
table and ivhdX device_ts are added under the acpi bus, while there are
no driver handling the corresponding IOMMU PCI function. In
amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX
device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is
called on ivhdX. the IOMMU pci function device_t is only used for
pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci
function, the IOMMU PCI function device_t's dinfo->cfg.msi is never
updated to reflect the supposed msi_data and msi_addr. So the msi_data
and msi_addr stay in the value 0. When pci_driver_added() tried to loop
over the children of a pci bus, and do pci_cfg_restore() on each of
them, msi_addr and msi_data with value 0 will be written to the MSI
capability of the IOMMU pci function, thus explaining the errors in
dmesg.
This change includes an amdiommu driver which currently does attaching,
detaching and providing DEVMETHODs for setting up and tearing down
interrupt. The purpose of the driver is to prevent pci_driver_added()
from calling pci_cfg_restore() on the IOMMU PCI function device_t.
The introduction of the amdiommu driver handles allocation of an IRQ
resource within the IOMMU PCI function, so that the dinfo->cfg.msi is
populated.
This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU.
Sponsored by: The FreeBSD Foundation
Reviewed by: jhb
Approved by: philip (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28984
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
The OBJS_DEPEND_GUESS mechanism was making vmx_genassym.o depend
on all headers along with vmx_assym.h, though vmx_assym.h depends
on having vmx_genassym.o present to generate. Moving the headers
to DPSRCS is enough to resolve the issue as they will no longer
be implicit dependencies for all objects. Because of this we
need explicit OBJS_DEPEND_GUESS entries to ensure the headers
are generated when needed for the *_support.o files that need
them.
X-MFC-With: r326552
MFC after: 2 weeks
Sponsored by: Dell EMC
The build process generates *assym.h using nm from *genassym.o (which is
in turn created from *genassym.c).
When compiling with link-time optimization (LTO) using -flto, .o files
are LLVM bitcode, not ELF objects. This is not usable by genassym.sh,
so remove -flto from those ${CC} invocations.
Submitted by: George Rimar
Reviewed by: dim
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D9659
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.
Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.
The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D1385
MFC after: 2 weeks
problems than it solves. SYSDIR is already defined almost always and
can be used instead. Working around the one case where it isn't is
much easier than working around the fact that @ may not exist in 18
other places.
Differential Revision: https://reviews.freebsd.org/D1100
This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).
Discussed with: grehan
Rename vmx_assym.s to vmx_assym.h to reflect that file's actual use
and update vmx_support.S's include to match. Add vmx_assym.h to the
SRCS to that it gets properly added to the dependency list. Add
vmx_support.S to SRCS as well, so it gets built and needs fewer
special-case goo. Remove now-redundant special-case goo. Finally,
vmx_genassym.o doesn't need to depend on a hand expanded ${_ILINKS}
explicitly, that's all taken care of by beforedepend.
With these items fixed, we no longer build vmm.ko every single time
through the modules on a KERNFAST build.
Sponsored by: Netflix
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.
The VT-x code has the following types of MSRs:
- MSRs that are unconditionally saved/restored on every guest/host context
switch (e.g., MSR_GSBASE).
- MSRs that are restored to guest values on entry to vmx_run() and saved
before returning. This is an optimization for MSRs that are not used in
host kernel context (e.g., MSR_KGSBASE).
- MSRs that are emulated and every access by the guest causes a trap into
the hypervisor (e.g., MSR_IA32_MISC_ENABLE).
Reviewed by: grehan
New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.
Reviewed by: jhb, neel
Approved by: neel (co-mentor)
bhyve supports a single timer block with 8 timers. The timers are all 32-bit
and capable of being operated in periodic mode. All timers support interrupt
delivery using MSI. Timers 0 and 1 also support legacy interrupt routing.
At the moment the timers are not connected to any ioapic pins but that will
be addressed in a subsequent commit.
This change is based on a patch from Tycho Nightingale (tycho.nightingale@pluribusnetworks.com).
upcoming in-kernel device emulations like the HPET.
The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to
manipulate the ioapic pin state.
Discussed with: grehan@
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
in the kernel. This abstraction was redundant because the only device emulated
inside vmm.ko is the local apic and it is always at a fixed guest physical
address.
Discussed with: grehan