Commit graph

54 commits

Author SHA1 Message Date
Mark Johnston
b9ef152bec vmm: Merge vmm_dev.c
This file contains the vmm device file implementation.  Most of this
code is not machine-dependent and so shouldn't be duplicated this way.
Move most of it into a generic dev/vmm/vmm_dev.c.  This will make it
easier to introduce a cdev-based interface for VM creation, which in
turn makes it possible to implement support for running bhyve as an
unprivileged user.

Machine-dependent ioctls continue to be handled in machine-dependent
code.  To make the split a bit easier to handle, introduce a pair of
tables which define MI and MD ioctls.  Each table entry can set flags
which determine which locks need to be held in order to execute the
handler.  vmmdev_ioctl() now looks up the ioctl in one of the tables,
acquires locks and either handles the ioctl directly or calls
vmmdev_machdep_ioctl() to handle it.

No functional change intended.  There is a lot of churn in this change
but the underlying logic in the ioctl handlers is the same.  For now,
vmm_dev.h is still mostly separate, even though some parts could be
merged in principle.  This would involve changing include paths for
userspace, though.

Reviewed by:	corvink, jhb
Differential Revision:	https://reviews.freebsd.org/D46431
2024-08-26 18:41:39 +00:00
Mark Johnston
93e81baa1c vmm: Move duplicated stats code into a generic file
There is a small difference between the arm64 and amd64 implementations:
the latter makes use of a "scope" to exclude AMD-specific stats on Intel
systems and vice-versa.  Replace this with a more generic predicate
callback which can be used for the same purpose.

No functional change intended.

Reviewed by:	corvink, jhb
Differential Revision:	https://reviews.freebsd.org/D46430
2024-08-26 18:41:14 +00:00
Andrew Turner
bbe97db3c2 arm64/vmm: Add the VHE exception and switcher files
These just need to include the common code with macros to ensure it is
built correctly.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46083
2024-08-20 08:49:16 +00:00
Andrew Turner
55aa31480c arm64/vmm: Create functions to call into EL2
These will become ifuncs to enable VHE in a later change.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46075
2024-08-20 08:49:15 +00:00
Andrew Turner
3d61bcf1eb arm64/vmm: Start to extract code not needed by VHE
We can share some of the vmm code between VHE and non-VHE modes. To
support this create new files that include the common code and create
macros to name what will be the common functions.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46072
2024-08-20 08:49:15 +00:00
Andrew Turner
12a6257a96 sys/conf: Introduce NOSAN_CFLAGS and NOSAN_C
To simplify disabling the kernel sanitizers in some files add
NOSAN_CFLAGS and NOSAN_C variables. These are CFLAGS and NORMAL_C with
the sanitizer flags removed.

While here add MSAN_CFLAGS to simplify keeping KMSAN in kern_kcov.c

Reviewed by:	khng, brooks, imp, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D45498
2024-08-20 08:49:15 +00:00
Warner Losh
f21a6a6a8f vmm: Build with proper ldscript on aarch64
A new instance of using ld with -T to bring in the kernel ld script
crept into the tree after I originally did the refactoring. It too needs
-L ${SYSDIR}/conf added.

Fixes: 37d6d682af
Sponsored by: Netflix
2024-07-31 23:47:13 -06:00
Warner Losh
e9ac41698b Remove residual blank line at start of Makefile
This is a residual of the $FreeBSD$ removal.

MFC After: 3 days (though I'll just run the command on the branches)
Sponsored by: Netflix
2024-07-15 16:43:39 -06:00
Mark Johnston
7cd9131591 vmm: Conditionalize addition of opt_*.h headers
These are only included in the amd64 vmm code, so it doesn't make sense
to list them unconditionally.

PR:		280171
Reviewed by:	wosch, imp, emaste
Differential Revision:	https://reviews.freebsd.org/D45964
2024-07-14 14:29:14 -04:00
Andrew Turner
63f7a38343 vmm: Only link the arm64 hyp code in vmm.ko once
This code runs at EL2 while the kernel runs at EL1. We build these
files for EL2 through a dependency in vmm_hyp_blob.elf.full so there
is no need to include them in SRCS.

Reviewed by:	imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D45467
2024-06-10 15:16:10 +00:00
Andrew Turner
c2e0d56f5e arm64: Support BTI checking in most of the kernel
LLD has the -zbti-report=error argument to check if the BTI note is
present when linking. To allow for this to be used when linking the
kernel and modules:
 - Add the BTI note to the remaining assembly files
 - Mark ptrauth.c as protected by BTI
 - Disable -zbti-report for vmm hypervisor switching code as it's not
   used there.

The linux64 module doesn't build with the flag as it includes vdso code
that doesn't include the note.

Reviewed by:	imp, kib, emaste
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D45466
2024-06-05 09:23:40 +00:00
Mark Johnston
52f3d65089 arm64/vmm: Define a dummy _start symbol in vmm_hyp_blob.elf
To silence a linker warning about _start being missing.  This blob
contains code executed at EL2 and is only meant to be entered via
exception handlers.

Reviewed by:	bz, emaste
Fixes:		47e073941f ("Import the kernel parts of bhyve/arm64")
Differential Revision:	https://reviews.freebsd.org/D44735
2024-04-11 11:04:22 -04:00
Mark Johnston
26173a919c arm64/vmm: Exclude more sanitizer compiler flags from certain files
Reported by:	rscheff
Fixes:		47e073941f ("Import the kernel parts of bhyve/arm64")
2024-03-22 00:20:34 -04:00
Gleb Smirnoff
7b133b34f8 vmm: fix standalone module build 2024-03-11 17:59:03 -07:00
Andrew Turner
47e073941f Import the kernel parts of bhyve/arm64
To support virtual machines on arm64 add the vmm code. This is based on
earlier work by Mihai Carabas and Alexandru Elisei at University
Politehnica of Bucharest, with further work by myself and Mark Johnston.

All AArch64 CPUs should work, however only the GICv3 interrupt
controller is supported. There is initial support to allow the GICv2
to be supported in the future. Only pure Armv8.0 virtualisation is
supported, the Virtualization Host Extensions are not currently used.

With a separate userspace patch and U-Boot port FreeBSD guests are able
to boot to multiuser mode, and the hypervisor can be tested with the
kvm unit tests. Linux partially boots, but hangs before entering
userspace. Other operating systems are untested.

Sponsored by:	Arm Ltd
Sponsored by:	Innovate UK
Sponsored by:	The FreeBSD Foundation
Sponsored by:	University Politehnica of Bucharest
Differential Revision:	https://reviews.freebsd.org/D37428
2024-02-21 18:55:32 +00:00
Mark Johnston
8e1a7e29b6 sanitizers: Avoid building genassym.c and genoffset.c with sanitizers
Some, particularly KASAN, may insert redzones around global symbols,
resulting in incorrect offset definitions because genassym.sh (ab)uses
symbol sizes to assign semantic meaning.

(Ideally I would be able to define this pattern in one place, but I
haven't found a way to define a GENSYM_CFLAGS that actually works for
all of the consumers (kern.post.mk, kmod.mk, sys/conf/files*).)

MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
2024-01-12 16:09:14 -05:00
Alex Xu (Hello71)
c6ae97c44d sys: ${CFLAGS:N-flto} -> ${CFLAGS:N-flto*}
For the same reason as the original https://reviews.freebsd.org/D9659:
-flto=<N>, -flto=full, and -flto=thin also produce the GIMPLE/bitcode
which is not supported by genassym, so filter those out as well.

Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/898
2023-12-26 16:31:51 -07:00
Warner Losh
031beb4e23 sys: Remove $FreeBSD$: one-line sh pattern
Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:58 -06:00
John Baldwin
c3dd45c47b sys/modules: Make use of SRCS.${KERN_OPT}.
kmod.mk appends the value of SRCS.${KERN_OPT} for each defined kernel
option to SRCS.  This helper is shorter than appending to SRCS under
explicit checks on KERN_OPTS.

Reviewed by:	imp
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D38738
2023-03-01 10:32:30 -08:00
Ka Ho Ng
74ada297e8 AMD-vi: Fix IOMMU device interrupts being overridden
Currently, AMD-vi PCI-e passthrough will lead to the following lines in
dmesg:
"kernel: CPU0: local APIC error 0x40
ivhd0: Error: completion failed tail:0x720, head:0x0."

After some tracing, the problem is due to the interaction with
amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the
identification of AMD-vi IVHD is done by walking over the ACPI IVRS
table and ivhdX device_ts are added under the acpi bus, while there are
no driver handling the corresponding IOMMU PCI function. In
amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX
device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is
called on ivhdX. the IOMMU pci function device_t is only used for
pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci
function, the IOMMU PCI function device_t's dinfo->cfg.msi is never
updated to reflect the supposed msi_data and msi_addr. So the msi_data
and msi_addr stay in the value 0. When pci_driver_added() tried to loop
over the children of a pci bus, and do pci_cfg_restore() on each of
them, msi_addr and msi_data with value 0 will be written to the MSI
capability of the IOMMU pci function, thus explaining the errors in
dmesg.

This change includes an amdiommu driver which currently does attaching,
detaching and providing DEVMETHODs for setting up and tearing down
interrupt. The purpose of the driver is to prevent pci_driver_added()
from calling pci_cfg_restore() on the IOMMU PCI function device_t.
The introduction of the amdiommu driver handles allocation of an IRQ
resource within the IOMMU PCI function, so that the dinfo->cfg.msi is
populated.

This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	jhb
Approved by:	philip (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28984
2021-03-22 17:33:43 +08:00
John Baldwin
483d953a86 Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed.  In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken).  A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations.  The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system).  In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions.  The file format also does not currently support
versioning of individual chunks of state.  As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files.  The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility.  As a result, the current implementation is not enabled
by default.  It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by:	Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by:	Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes:	yes
Sponsored by:	University Politehnica of Bucharest
Sponsored by:	Matthew Grooms (student scholarships)
Sponsored by:	iXsystems
Differential Revision:	https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
Alex Richardson
4db3ef4c77 More fixes to build the kernel with a compiler that defaults to -fno-common
Using the same approach as the last commit for the files used by genassym.sh.

Obtained from:	CheriBSD
2020-04-18 12:54:40 +00:00
Mark Johnston
b09e7a4f42 Remove more manual additions of -DSMP.
Since r357598 this should no longer be necessary.
2020-02-06 21:01:19 +00:00
Bryan Drewery
ccca101f70 All genassym.sh usage need offset.inc 2018-07-03 21:02:25 +00:00
Bryan Drewery
488adf43d4 Fix cyclic dependency after r326552.
The OBJS_DEPEND_GUESS mechanism was making vmx_genassym.o depend
on all headers along with vmx_assym.h, though vmx_assym.h depends
on having vmx_genassym.o present to generate.  Moving the headers
to DPSRCS is enough to resolve the issue as they will no longer
be implicit dependencies for all objects.  Because of this we
need explicit OBJS_DEPEND_GUESS entries to ensure the headers
are generated when needed for the *_support.o files that need
them.

X-MFC-With:	r326552
MFC after:	2 weeks
Sponsored by:	Dell EMC
2017-12-05 17:23:33 +00:00
Anish Gupta
07ff474a68 Add AMD IOMMU/AMD-Vi support in bhyve for passthrough/direct assignment to VMs. To enable AMD-Vi, set hw.vmm.amdvi.enable=1.
Reviewed by:bcr
Approved by:grehan
Tested by:rgrimes
Differential Revision:https://reviews.freebsd.org/D10049
2017-04-30 02:08:46 +00:00
Enji Cooper
193d9e768b sys/modules: normalize .CURDIR-relative paths to SRCTOP
This simplifies make output/logic

Tested with:	`cd sys/modules; make ALL_MODULES=` on amd64
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
2017-03-04 10:10:17 +00:00
Ed Maste
0e8b3ab348 Exclude -flto when building *genassym.o
The build process generates *assym.h using nm from *genassym.o (which is
in turn created from *genassym.c).

When compiling with link-time optimization (LTO) using -flto, .o files
are LLVM bitcode, not ELF objects. This is not usable by genassym.sh,
so remove -flto from those ${CC} invocations.

Submitted by:	George Rimar
Reviewed by:	dim
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D9659
2017-02-21 18:59:17 +00:00
Neel Natu
18a2b08e65 Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve when
vmm.ko is loaded.

Also relocate the 'justreturn' IPI handler to be alongside all other handlers.

Requested by:	kib
2015-03-14 02:32:08 +00:00
Neel Natu
0dafa5cd4b Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.

Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.

The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>

Reviewed by:	tychon
Discussed with:	grehan
Differential Revision:	https://reviews.freebsd.org/D1385
MFC after:	2 weeks
2014-12-30 22:19:34 +00:00
Warner Losh
528013d5a8 Retire the '@' symlink. It isn't really needed and causes more
problems than it solves. SYSDIR is already defined almost always and
can be used instead. Working around the one case where it isn't is
much easier than working around the fact that @ may not exist in 18
other places.

Differential Revision: https://reviews.freebsd.org/D1100
2014-11-06 16:48:37 +00:00
John Baldwin
b3ee0d3bbd Add foo_genassym.c files to DPSRCS so dependencies for them are generated.
This ensures these objects are rebuilt to generate an updated header of
assembly constants if needed.
2014-10-27 18:37:11 +00:00
Neel Natu
160ef77abf Move the ACPI PM timer emulation into vmm.ko.
This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).

Discussed with:	grehan
2014-10-26 04:44:28 +00:00
Neel Natu
e1a172e1c2 IFC @r273214 2014-10-20 02:57:30 +00:00
Warner Losh
b82e2e94e2 Fix build to not bogusly always rebuild vmm.ko.
Rename vmx_assym.s to vmx_assym.h to reflect that file's actual use
and update vmx_support.S's include to match. Add vmx_assym.h to the
SRCS to that it gets properly added to the dependency list. Add
vmx_support.S to SRCS as well, so it gets built and needs fewer
special-case goo. Remove now-redundant special-case goo. Finally,
vmx_genassym.o doesn't need to depend on a hand expanded ${_ILINKS}
explicitly, that's all taken care of by beforedepend.

With these items fixed, we no longer build vmm.ko every single time
through the modules on a KERNFAST build.

Sponsored by: Netflix
2014-10-17 13:20:49 +00:00
Neel Natu
8f02c5e456 IFC r271888.
Restructure MSR emulation so it is all done in processor-specific code.
2014-09-20 21:46:31 +00:00
Neel Natu
c3498942a5 Restructure the MSR handling so it is entirely handled by processor-specific
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.

The VT-x code has the following types of MSRs:

- MSRs that are unconditionally saved/restored on every guest/host context
  switch (e.g., MSR_GSBASE).

- MSRs that are restored to guest values on entry to vmx_run() and saved
  before returning. This is an optimization for MSRs that are not used in
  host kernel context (e.g., MSR_KGSBASE).

- MSRs that are emulated and every access by the guest causes a trap into
  the hypervisor (e.g., MSR_IA32_MISC_ENABLE).

Reviewed by:	grehan
2014-09-20 02:35:21 +00:00
Peter Grehan
6cec9cad76 MFC @ r266724
An SVM update will follow this.
2014-06-03 02:34:21 +00:00
Tycho Nightingale
e883c9bb40 Move the atpit device model from userspace into vmm.ko for better
precision and lower latency.

Approved by:	grehan (co-mentor)
2014-03-25 19:20:34 +00:00
Tycho Nightingale
762fd20804 Replace the userspace atpic stub with a more functional vmm.ko model.
New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.

Reviewed by:	jhb, neel
Approved by:	neel (co-mentor)
2014-03-11 16:56:00 +00:00
Peter Grehan
485ac45a53 MFC @ r259205 in preparation for some SVM updates. (for real this time) 2014-02-04 06:59:08 +00:00
Neel Natu
08e3ff329a Add HPET device emulation to bhyve.
bhyve supports a single timer block with 8 timers. The timers are all 32-bit
and capable of being operated in periodic mode. All timers support interrupt
delivery using MSI. Timers 0 and 1 also support legacy interrupt routing.

At the moment the timers are not connected to any ioapic pins but that will
be addressed in a subsequent commit.

This change is based on a patch from Tycho Nightingale (tycho.nightingale@pluribusnetworks.com).
2013-11-25 19:04:51 +00:00
Neel Natu
565bbb8698 Move the ioapic device model from userspace into vmm.ko. This is needed for
upcoming in-kernel device emulations like the HPET.

The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to
manipulate the ioapic pin state.

Discussed with:	grehan@
Submitted by:	Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
2013-11-12 22:51:03 +00:00
Neel Natu
03cd05011f Remove the 'vdev' abstraction that was meant to sit on top of device models
in the kernel. This abstraction was redundant because the only device emulated
inside vmm.ko is the local apic and it is always at a fixed guest physical
address.

Discussed with:	grehan
2013-11-04 23:25:07 +00:00
Peter Grehan
ef90af83a5 IFC @ r255692
Comment out IA32_MISC_ENABLE MSR access - this doesn't exist on AMD.
Need to sort out how arch-specific MSRs will be handled.
2013-09-20 00:46:29 +00:00
Peter Grehan
df5e6de3e3 Add in last remaining files to get AMD-SVM operational.
Submitted by:	Anish Gupta (akgupt3@gmail.com)
2013-08-23 00:37:26 +00:00
Ulrich Spörlein
f44b688c57 Fix 'make depend'. 2013-08-21 08:01:52 +00:00
Neel Natu
b01c203325 Corral all the host state associated with the virtual machine into its own file.
This state is independent of the type of hardware assist used so there is
really no need for it to be in Intel-specific code.

Obtained from:	NetApp
2012-10-29 01:51:24 +00:00
Neel Natu
a2da7af6bc Add support for trapping MMIO writes to local apic registers and emulating them.
The default behavior is still to present the local apic to the guest in the
x2apic mode.
2012-09-25 22:31:35 +00:00
Peter Grehan
d7e0c54045 BHyVe's vmm.ko can now be built with the in-tree binutils.
Many thanks to jhb@ for making this happen.
2012-07-11 23:12:17 +00:00