opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-04 17:05:14 -04:00

Author	SHA1	Message	Date
Konstantin Belousov	dfe7b3bfbc	Move some common code from sys/amd64/amd64/machdep.c and sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c. Most of the code is related to the idle handling. Discussed with: pluknet Sponsored by: The FreeBSD Foundation	2015-04-22 12:32:14 +00:00
Ed Maste	27513bd396	Use explicitly sized types in EFI module metadata This will allow the same metadata struct to be used on all platforms. Differential Revision: https://reviews.freebsd.org/D2275 Reviewed by: jhb	2015-04-10 19:26:45 +00:00
Tycho Nightingale	ef7c2a82ed	Fix "MOVS" instruction memory to MMIO emulation. Currently updates to %rdi, %rsi, etc are inadvertently bypassed along with the check to see if the instruction needs to be repeated per the 'rep' prefix. Add "MOVS" instruction support for the 'MMIO to MMIO' case. Reviewed by: neel	2015-04-01 00:15:31 +00:00
Tycho Nightingale	e4f605ee81	When fetching an instruction in non-64bit mode, consider the value of the code segment base address. Also if an instruction doesn't support a mod R/M (modRM) byte, don't be concerned if the CPU is in real mode. Reviewed by: neel	2015-03-24 17:12:36 +00:00
Konstantin Belousov	0a110d5b17	Use VT-d interrupt remapping block (IR) to perform FSB messages translation. In particular, despite IO-APICs only take 8bit apic id, IR translation structures accept 32bit APIC Id, which allows x2APIC mode to function properly. Extend msi_cpu of struct msi_intrsrc and io_cpu of ioapic_intsrc to full int from one byte. KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid bringing all dmar headers into interrupt code. The non-PCI(e) devices which generate message interrupts on FSB require special handling. The HPET FSB interrupts are remapped, while DMAR interrupts are not. For each msi and ioapic interrupt source, the iommu cookie is added, which is in fact index of the IRE (interrupt remap entry) in the IR table. Cookie is made at the source allocation time, and then used at the map time to fill both IRE and device registers. The MSI address/data registers and IO-APIC redirection registers are programmed with the special values which are recognized by IR and used to restore the IRE index, to find proper delivery mode and target. Map all MSI interrupts in the block when msi_map() is called. Since an interrupt source setup and dismantle code are done in the non-sleepable context, flushing interrupt entries cache in the IR hardware, which is done async and ideally waits for the interrupt, requires busy-wait for queue to drain. The dmar_qi_wait_for_seq() is modified to take a boolean argument requesting busy-wait for the written sequence number instead of waiting for interrupt. Some interrupts are configured before IR is initialized, e.g. ACPI SCI. Add intr_reprogram() function to reprogram all already configured interrupts, and call it immediately before an IR unit is enabled. There is still a small window after the IO-APIC redirection entry is reprogrammed with cookie but before the unit is enabled, but to fix this properly, IR must be started much earlier. Add workarounds for 5500 and X58 northbridges, some revisions of which have severe flaws in handling IR. Use the same identification methods as employed by Linux. Review: https://reviews.freebsd.org/D1892 Reviewed by: neel Discussed with: jhb Tested by: glebius, pho (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-03-19 13:57:47 +00:00
Neel Natu	18a2b08e65	Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve when vmm.ko is loaded. Also relocate the 'justreturn' IPI handler to be alongside all other handlers. Requested by: kib	2015-03-14 02:32:08 +00:00
Konstantin Belousov	4c918926cd	Add x2APIC support. Enable it by default if CPU is capable. The hw.x2apic_enable tunable allows disabling it from the loader prompt. To closely repeat effects of the uncached memory ops when accessing registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded by mfence, except for the EOI notifications. This is probably too strict, only ICR writes to send IPI require serialization to ensure that other CPUs see the previous actions when IPI is delivered. This may be changed later. In vmm justreturn IPI handler, call doreti_iret instead of doing iretd inline, to handle corner conditions. Note that the patch only switches LAPICs into x2APIC mode. It does not enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC MADT entries and doing interrupts remapping, but is the required step on the way. Reviewed by: neel Tested by: pho (real hardware), neel (on bhyve) Discussed with: jhb, grehan Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-09 21:00:56 +00:00
Bryan Venteicher	d3ccddf3ce	Generalized parts of the XEN timer code into a generic pvclock KVM clock shares the same data structures between the guest and the host as Xen so it makes sense to just have a single copy of this code. Differential Revision: https://reviews.freebsd.org/D1429 Reviewed by: royger (eariler version) MFC after: 1 month	2015-02-04 08:26:43 +00:00
Neel Natu	7534635359	MOVS instruction emulation. These instructions are emitted by 'bus_space_read_region()' when accessing MMIO regions. Since MOVS can be used with a repeat prefix start decoding the REPZ and REPNZ prefixes. Also start decoding the segment override prefix since MOVS allows overriding the source operand segment register. Tested by: tychon MFC after: 1 week	2015-01-19 06:53:31 +00:00
Neel Natu	d087a39935	Simplify instruction restart logic in bhyve. Keep track of the next instruction to be executed by the vcpu as 'nextrip'. As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should start execution. Also, instruction restart happens implicitly via 'vm_inject_exception()' or explicitly via 'vm_restart_instruction()'. The APIs behave identically in both kernel and userspace contexts. The main beneficiary is the instruction emulation code that executes in both contexts. bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length' as readonly: - Restarting an instruction is now done by calling 'vm_restart_instruction()' as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout()) - Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch()) Differential Revision: https://reviews.freebsd.org/D1526 Reviewed by: grehan MFC after: 2 weeks	2015-01-18 03:08:30 +00:00
Roger Pau Monné	ca49b3342d	loader: implement multiboot support for Xen Dom0 Implement a subset of the multiboot specification in order to boot Xen and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot implementation is tailored to boot Xen and FreeBSD Dom0, and it will most surely fail to boot any other multiboot compilant kernel. In order to detect and boot the Xen microkernel, two new file formats are added to the bootloader, multiboot and multiboot_obj. Multiboot support must be tested before regular ELF support, since Xen is a multiboot kernel that also uses ELF. After a multiboot kernel is detected, all the other loaded kernels/modules are parsed by the multiboot_obj format. The layout of the loaded objects in memory is the following; first the Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to long mode by itself), after that the FreeBSD kernel is loaded as a RAW file (Xen will parse and load it using it's internal ELF loader), and finally the metadata and the modules are loaded using the native FreeBSD way. After everything is loaded we jump into Xen's entry point using a small trampoline. The order of the multiboot modules passed to Xen is the following, the first module is the RAW FreeBSD kernel, and the second module is the metadata and the FreeBSD modules. Since Xen will relocate the memory position of the second multiboot module (the one that contains the metadata and native FreeBSD modules), we need to stash the original modulep address inside of the metadata itself in order to recalculate its position once booted. This also means the metadata must come before the loaded modules, so after loading the FreeBSD kernel a portion of memory is reserved in order to place the metadata before booting. In order to tell the loader to boot Xen and then the FreeBSD kernel the following has to be added to the /boot/loader.conf file: xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga" xen_kernel="/boot/xen" The first argument contains the command line that will be passed to the Xen kernel, while the second argument is the path to the Xen kernel itself. This can also be done manually from the loader command line, by for example typing the following set of commands: OK unload OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga OK load kernel OK load zfs OK load if_tap OK load ... OK boot Sponsored by: Citrix Systems R&D Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D517 For the Forth bits: Submitted by: Julien Grall <julien.grall AT citrix.com>	2015-01-15 16:27:20 +00:00
Neel Natu	c9c75df48c	'struct vm_exception' was intended to be used only as the collateral for the VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping track pending exceptions for a vcpu. This in turn causes confusion because some fields in 'struct vm_exception' like 'vcpuid' make sense only in the ioctl context. It also makes it harder to add or remove structure fields. Fix this by using 'struct vm_exception' only to communicate information from userspace to vmm.ko when injecting an exception. Also, add a field 'restart_instruction' to 'struct vm_exception'. This field is set to '1' for exceptions where the faulting instruction is restarted after the exception is handled. MFC after: 1 week	2015-01-13 22:00:47 +00:00
Konstantin Belousov	b5243bd4f7	Revert r276600: PHYS_TO_DMAP_RAW() and DMAP_TO_PHYS_RAW() are no longer used, remove them. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 07:50:55 +00:00
Konstantin Belousov	b1752aa0ea	For x86, read MAXPHYADDR, defined in SDM vol 3 4.1.4 Enumeration of Paging Features by CPUID as CPUID.80000008H:EAX[7:0], into variable cpu_maxphyaddr. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 07:36:25 +00:00
Mark Johnston	bdb9ab0dd9	Factor out duplicated code from dumpsys() on each architecture into generic code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly identical and simply redefine a number of constants and helper subroutines; a generic implementation will make it easier to implement features around kernel core dumps. This change does not alter any minidump code and should have no functional impact. PR: 193873 Differential Revision: https://reviews.freebsd.org/D904 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Reviewed by: jhibbits (earlier version) Sponsored by: EMC / Isilon Storage Division	2015-01-07 01:01:39 +00:00
Konstantin Belousov	a40e51e355	For /dev/mem and /dev/kmem accesses, avoid asserting that addresses are within direct map. We want to return error instead of panicing. PR: 194995 Sponsored by: The FreeBSD Foundation	2015-01-03 01:28:58 +00:00
Alan Cox	d866a563d4	The physical memory allocator supports the use of distinct free lists for managing pages from different address ranges. Generally speaking, this feature is used to increase the likelihood that physical pages are available that can meet special DMA requirements or can be accessed through a limited-coverage direct mapping (e.g., MIPS). However, prior to this change, the configuration of the free lists was static, i.e., it was determined at compile time. Consequentally, free lists could be created for address ranges that held no actual pages, for example, on 32-bit MIPS- based systems with 512 MB or less of physical memory. This change makes the creation of the free lists dynamic, i.e., it is based on the available physical memory at boot time. On 64-bit x86-based systems with 64 GB or more of physical memory, create free lists for managing pages with physical addresses below 4 GB. This change is to address reported problems with initializing devices that require the allocation of physical pages below 4 GB on some systems with 128 GB or more of physical memory. PR: 185727 Differential Revision: https://reviews.freebsd.org/D1274 Reviewed by: jhb, kib MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division	2014-12-31 00:54:38 +00:00
Neel Natu	0dafa5cd4b	Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko. The new RTC emulation supports all interrupt modes: periodic, update ended and alarm. It is also capable of maintaining the date/time and NVRAM contents across virtual machine reset. Also, the date/time fields can now be modified by the guest. Since bhyve now emulates both the PIT and the RTC there is no need for "Legacy Replacement Routing" in the HPET so get rid of it. The RTC device state can be inspected via bhyvectl as follows: bhyvectl --vm=vm --get-rtc-time bhyvectl --vm=vm --set-rtc-time=<unix_time_secs> bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value> Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D1385 MFC after: 2 weeks	2014-12-30 22:19:34 +00:00
Neel Natu	b053814333	Allow ktr(4) tracing of all guest exceptions via the tunable "hw.vmm.trace_guest_exceptions". To enable this feature set the tunable to "1" before loading vmm.ko. Tracing the guest exceptions can be useful when debugging guest triple faults. Note that there is a performance impact when exception tracing is enabled since every exception will now trigger a VM-exit. Also, handle machine check exceptions that happen during guest execution by vectoring to the host's machine check handler via "int $18". Discussed with: grehan MFC after: 2 weeks	2014-12-23 02:14:49 +00:00
Ed Maste	294246bb7d	Revert r274772: it is not valid on MIPS Reported by: sbruno	2014-11-25 03:50:31 +00:00
Ed Maste	688fd61ae8	Use canonical __PIC__ flag It is automatically set when -fPIC is passed to the compiler. Reviewed by: dim, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D1179	2014-11-21 02:05:48 +00:00
Scott Long	18e3d9f521	Extend earlier addition of stack frames to most of support.S. This makes stack traces in KDB, HWPMC, and DTrace much more reliable and useful. Reviewed by: kan, kib Obtained from: Netflix, Inc. MFC after: 5 days	2014-11-13 22:11:44 +00:00
Ed Maste	96699e86a3	Add workaround for vt efifb's early use of PHYS_TO_DMAP In vt_efifb_init the framebuffer's physaddr is passed to PHYS_TO_DMAP before the DMAP is setup. The result is not actually accessed until after the mapping is setup, though. Loosen the assertion in PHYS_TO_DMAP for now, to allow use when dmaplimit == 0. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D1142	2014-11-11 14:59:46 +00:00
John Baldwin	01e1933dcc	Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel	2014-10-28 19:17:44 +00:00
Neel Natu	160ef77abf	Move the ACPI PM timer emulation into vmm.ko. This reduces variability during timer calibration by keeping the emulation "close" to the guest. Additionally having all timer emulations in the kernel will ease the transition to a per-VM clock source (as opposed to using the host's uptime keep track of time). Discussed with: grehan	2014-10-26 04:44:28 +00:00
Roger Pau Monné	927dc0e02a	amd64: make uiomove_fromphys functional for pages not mapped by the DMAP Place the code introduced in r268660 into a separate function that can be called from uiomove_fromphys. Instead of pre-allocating two KVA pages use vmem_alloc to allocate them on demand when needed. This prevents blocking if a page fault is taken while physical addresses from outside the DMAP are used, since the lock is now removed. Also introduce a safety catch in PHYS_TO_DMAP and DMAP_TO_PHYS. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D947 amd64/amd64/pmap.c: - Factor out the code to deal with non DMAP addresses from pmap_copy_pages and place it in pmap_map_io_transient. - Change the code to use vmem_alloc instead of a set of pre-allocated pages. - Use pmap_qenter and don't pin the thread if there can be page faults. amd64/amd64/uio_machdep.c: - Use pmap_map_io_transient in order to correctly deal with physical addresses not covered by the DMAP. amd64/include/pmap.h: - Add the prototypes for the new functions. amd64/include/vmparam.h: - Add safety catches to make sure PHYS_TO_DMAP and DMAP_TO_PHYS are only used with addresses covered by the DMAP.	2014-10-24 09:48:58 +00:00
Roger Pau Monné	bf7313e3b7	xen: implement the privcmd user-space device This device is only attached to priviledged domains, and allows the toolstack to interact with Xen. The two functions of the privcmd interface is to allow the execution of hypercalls from user-space, and the mapping of foreign domain memory. Sponsored by: Citrix Systems R&D i386/include/xen/hypercall.h: amd64/include/xen/hypercall.h: - Introduce a function to make generic hypercalls into Xen. xen/interface/xen.h: xen/interface/memory.h: - Import the new hypercall XENMEM_add_to_physmap_range used by auto-translated guests to map memory from foreign domains. dev/xen/privcmd/privcmd.c: - This device has the following functions: - Allow user-space applications to make hypercalls into Xen. - Allow user-space applications to map memory from foreign domains, this is accomplished using the newly introduced hypercall (XENMEM_add_to_physmap_range). xen/privcmd.h: - Public ioctl interface for the privcmd device. x86/xen/hvm.c: - Remove declaration of hypercall_page, now it's declared in hypercall.h. conf/files: - Add the privcmd device to the build process.	2014-10-22 17:07:20 +00:00
Neel Natu	ed6aacb51f	IFC @r272887	2014-10-10 23:52:56 +00:00
Mark Johnston	5eaae1411f	Pass up the error status of minidumpsys() to its callers. PR: 193761 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Sponsored by: EMC / Isilon Storage Division	2014-10-08 20:25:21 +00:00
Konstantin Belousov	07a92f34d6	Add an argument to the x86 pmap_invalidate_cache_range() to request forced invalidation of the cache range regardless of the presence of self-snoop feature. Some recent Intel GPUs in some modes are not coherent, and dirty lines in CPU cache must be flushed before the pages are transferred to GPU domain. Reviewed by: alc (previous version) Tested by: pho (amd64) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-10-08 16:48:03 +00:00
Neel Natu	65145c7f50	Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'. The hypervisor hides the MONITOR/MWAIT capability by unconditionally setting CPUID.01H:ECX[3] to 0 so the guest should not expect these instructions to be present anyways. Discussed with: grehan	2014-10-06 20:48:01 +00:00
Neel Natu	8f02c5e456	IFC r271888. Restructure MSR emulation so it is all done in processor-specific code.	2014-09-20 21:46:31 +00:00
Neel Natu	c3498942a5	Restructure the MSR handling so it is entirely handled by processor-specific code. There are only a handful of MSRs common between the two so there isn't too much duplicate functionality. The VT-x code has the following types of MSRs: - MSRs that are unconditionally saved/restored on every guest/host context switch (e.g., MSR_GSBASE). - MSRs that are restored to guest values on entry to vmx_run() and saved before returning. This is an optimization for MSRs that are not used in host kernel context (e.g., MSR_KGSBASE). - MSRs that are emulated and every access by the guest causes a trap into the hypervisor (e.g., MSR_IA32_MISC_ENABLE). Reviewed by: grehan	2014-09-20 02:35:21 +00:00
Neel Natu	4e27d36d38	IFC @r271694	2014-09-17 18:46:51 +00:00
Neel Natu	bbadcde418	Set the 'vmexit->inst_length' field properly depending on the type of the VM-exit and ultimately on whether nRIP is valid. This allows us to update the %rip after the emulation is finished so any exceptions triggered during the emulation will point to the right instruction. Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability is available. The effective segment field in EXITINFO1 is not valid without this capability. Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging. Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2 fields in bhyve(8). Reviewed by: Anish Gupta (akgupt3@gmail.com) Reviewed by: grehan	2014-09-14 04:39:04 +00:00
Neel Natu	c2a875f970	AMD processors that have the SVM decode assist capability will store the instruction bytes in the VMCB on a nested page fault. This is useful because it saves having to walk the guest page tables to fetch the instruction. vie_init() now takes two additional parameters 'inst_bytes' and 'inst_len' that map directly to 'vie->inst[]' and 'vie->num_valid'. The instruction emulation handler skips calling 'vmm_fetch_instruction()' if 'vie->num_valid' is non-zero. The use of this capability can be turned off by setting the sysctl/tunable 'hw.vmm.svm.disable_npf_assist' to '1'. Reviewed by: Anish Gupta (akgupt3@gmail.com) Discussed with: grehan	2014-09-13 22:16:40 +00:00
Neel Natu	d181963296	Optimize the common case of injecting an interrupt into a vcpu after a HLT by explicitly moving it out of the interrupt shadow. The hypervisor is done "executing" the HLT and by definition this moves the vcpu out of the 1-instruction interrupt shadow. Prior to this change the interrupt would be held pending because the VMCS guest-interruptibility-state would indicate that "blocking by STI" was in effect. This resulted in an unnecessary round trip into the guest before the pending interrupt could be injected. Reviewed by: grehan	2014-09-12 06:15:20 +00:00
John Baldwin	b1d735ba4c	Create a separate structure for per-CPU state saved across suspend and resume that is a superset of a pcb. Move the FPU state out of the pcb and into this new structure. As part of this, move the FPU resume code on amd64 into a C function. This allows resumectx() to still operate only on a pcb and more closely mirrors the i386 code. Reviewed by: kib (earlier version)	2014-09-06 15:23:28 +00:00
John Baldwin	7fb40488d6	- Move prototypes for various functions into out of C files and into <machine/md_var.h>. - Move some CPU-related variables out of i386/i386/identcpu.c to initcpu.c to match amd64. - Move the declaration of has_f00f_hack out of identcpu.c to machdep.c. - Remove a misleading comment from i386/i386/initcpu.c (locore zeros the BSS before it calls identify_cpu()) and remove explicit zero assignments to reduce the diff with amd64.	2014-09-04 01:46:06 +00:00
John Baldwin	89871cdeb6	- Add a new structure type for the ACPI 3.0 SMAP entry that includes the optional attributes field. - Add a 'machdep.smap' sysctl that exports the SMAP table of the running system as an array of the ACPI 3.0 structure. (On older systems, the attributes are given a value of zero.) Note that the sysctl only exports the SMAP table if it is available via the metadata passed from the loader to the kernel. If an SMAP is not available, an empty array is returned. - Add a format handler for the ACPI 3.0 SMAP structure to the sysctl(8) binary to format the SMAP structures in a readable format similar to the format found in boot messages. MFC after: 2 weeks	2014-08-29 21:25:47 +00:00
Peter Grehan	7f21538b6e	Change __inline style to be consistent with FreeBSD usage, and also fix gcc build (on STABLE, when MFCd). PR: 192880 Reviewed by: neel Reported by: ngie MFC after: 1 day	2014-08-24 02:07:34 +00:00
John Baldwin	64d6de263b	Bump MAXCPU on amd64 from 64 to 256. In practice APIC only permits 255 CPUs (IDs 0 through 254). Getting above that limit requires x2APIC. MFC after: 1 month	2014-08-20 16:06:24 +00:00
Konstantin Belousov	3165194c6b	Increase max number of physical segments on amd64 to 63. Eventually, the vmd_segs of the struct vm_domain should become bitset instead of long, to allow arbitrary compile-time selected maximum. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-20 08:07:08 +00:00
Gleb Smirnoff	c8d2ffd6a7	Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c The MD allocators were very common, however there were some minor differencies. These differencies were all consolidated in the MI allocator, under ifdefs. The defines from machine/vmparam.h turn on features required for a particular machine. For details look in the comment in sys/sf_buf.h. As result no MD code left in sys///vm_machdep.c. Some arches still have machine/sf_buf.h, which is usually quite small. Tested by: glebius (i386), tuexen (arm32), kevlo (arm32) Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-08-05 09:44:10 +00:00
Neel Natu	f008d1571d	If a vcpu has issued a HLT instruction with interrupts disabled then it sleeps forever in vm_handle_hlt(). This is usually not an issue as long as one of the other vcpus properly resets or powers off the virtual machine. However, if the bhyve(8) process is killed with a signal the halted vcpu cannot be woken up because it's sleep cannot be interrupted. Fix this by waking up periodically and returning from vm_handle_hlt() if TDF_ASTPENDING is set. Reported by: Leon Dang Sponsored by: Nahanni Systems	2014-07-26 02:53:51 +00:00
Neel Natu	d37f2adb38	Fix fault injection in bhyve. The faulting instruction needs to be restarted when the exception handler is done handling the fault. bhyve now does this correctly by setting 'vmexit[vcpu].inst_length' to zero so the %rip is not advanced. A minor complication is that the fault injection APIs are used by instruction emulation code that is shared by vmm.ko and bhyve. Thus the argument that refers to 'struct vm ' in kernel or 'struct vmctx ' in userspace needs to be loosely typed as a 'void *'.	2014-07-24 01:38:11 +00:00
Neel Natu	d665d229ce	Emulate instructions emitted by OpenBSD/i386 version 5.5: - CMP REG, r/m - MOV AX/EAX/RAX, moffset - MOV moffset, AX/EAX/RAX - PUSH r/m	2014-07-23 04:28:51 +00:00
Neel Natu	091d453222	Handle nested exceptions in bhyve. A nested exception condition arises when a second exception is triggered while delivering the first exception. Most nested exceptions can be handled serially but some are converted into a double fault. If an exception is generated during delivery of a double fault then the virtual machine shuts down as a result of a triple fault. vm_exit_intinfo() is used to record that a VM-exit happened while an event was being delivered through the IDT. If an exception is triggered while handling the VM-exit it will be treated like a nested exception. vm_entry_intinfo() is used by processor-specific code to get the event to be injected into the guest on the next VM-entry. This function is responsible for deciding the disposition of nested exceptions.	2014-07-19 20:59:08 +00:00
Neel Natu	3d5444c864	Add emulation for legacy x86 task switching mechanism. FreeBSD/i386 uses task switching to handle double fault exceptions and this change enables that to work. Reported by: glebius	2014-07-16 21:26:26 +00:00
Neel Natu	f7a9f1784f	Add support for operand size and address size override prefixes in bhyve's instruction emulation [1]. Fix bug in emulation of opcode 0x8A where the destination is a legacy high byte register and the guest vcpu is in 32-bit mode. Prior to this change instead of modifying %ah, %bh, %ch or %dh the emulation would end up modifying %spl, %bpl, %sil or %dil instead. Add support for moffsets by treating it as a 2, 4 or 8 byte immediate value during instruction decoding. Fix bug in verify_gla() where the linear address computed after decoding the instruction was not being truncated to the effective address size [2]. Tested by: Leon Dang [1] Reported by: Peter Grehan [2] Sponsored by: Nahanni Systems	2014-07-15 17:37:17 +00:00
Neel Natu	b301b9e28f	Accurately identify the vcpu's operating mode as 64-bit, compatibility, protected or real.	2014-07-08 21:48:57 +00:00
Konstantin Belousov	633034fe0e	Add FPU_KERN_KTHR flag to fpu_kern_enter(9), which avoids saving FPU context into memory for the kernel threads which called fpu_kern_thread(9). This allows the fpu_kern_enter() callers to not check for is_fpu_kern_thread() to get the optimization. Apply the flag to padlock(4) and aesni(4). In aesni_cipher_process(), do not leak FPU context state on error. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-23 07:37:54 +00:00
Roger Pau Monné	ef409ede7b	amd64/i386: introduce APIC hooks for different APIC implementations. This is needed for Xen PV(H) guests, since there's no hardware lapic available on this kind of domains. This commit should not change functionality. Sponsored by: Citrix Systems R&D Reviewed by: jhb Approved by: gibbs amd64/include/cpu.h: amd64/amd64/mp_machdep.c: i386/include/cpu.h: i386/i386/mp_machdep.c: - Remove lapic_ipi_vectored hook from cpu_ops, since it's now implemented in the lapic hooks. amd64/amd64/mp_machdep.c: i386/i386/mp_machdep.c: - Use lapic_ipi_vectored directly, since it's now an inline function that will call the appropiate hook. x86/x86/local_apic.c: - Prefix bare metal public lapic functions with native_ and mark them as static. - Define default implementation of apic_ops. x86/include/apicvar.h: - Declare the apic_ops structure and create inline functions to access the hooks, so the change is transparent to existing users of the lapic_ functions. x86/xen/hvm.c: - Switch to use the new apic_ops.	2014-06-16 08:43:03 +00:00
Tycho Nightingale	5ebc578ba6	Replace enum forward declarations with complete definitions. Reviewed by: neel	2014-06-10 18:46:00 +00:00
Neel Natu	404874659f	Add helper functions to populate VM exit information for rendezvous and astpending exits. This is to reduce code duplication between VT-x and SVM implementations.	2014-06-10 16:45:58 +00:00
Neel Natu	5fcf252f41	Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained by vmm.ko. This allows the virtual machine to be restarted without having to destroy it first. Reviewed by: grehan	2014-06-07 21:36:52 +00:00
Neel Natu	95ebc360ef	Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing it implicitly in vmm.ko. Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and "--get-suspended-cpus" options. This is in preparation for being able to reset virtual machine state without having to destroy and recreate it.	2014-05-31 23:37:34 +00:00
Neel Natu	65ffa035a7	Add segment protection and limits violation checks in vie_calculate_gla() for 32-bit x86 guests. Tested using ins/outs executed in a FreeBSD/i386 guest.	2014-05-27 04:26:22 +00:00
Neel Natu	5382c19d81	Do the linear address calculation for the ins/outs emulation using a new API function 'vie_calculate_gla()'. While the current implementation is simplistic it forms the basis of doing segmentation checks if the guest is in 32-bit protected mode.	2014-05-25 00:57:24 +00:00
Neel Natu	da11f4aa1d	Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out of the guest linear address space. These APIs in turn use a new ioctl 'VM_GLA2GPA' to convert the guest linear address to guest physical. Use the new copyin/copyout APIs when emulating ins/outs instruction in bhyve(8).	2014-05-24 23:12:30 +00:00
Neel Natu	e813a87350	Consolidate all the information needed by the guest page table walker into 'struct vm_guest_paging'. Check for canonical addressing in vmm_gla2gpa() and inject a protection fault into the guest if a violation is detected. If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to point to the root of the page tables.	2014-05-24 20:26:57 +00:00
Neel Natu	37a723a5b3	When injecting a page fault into the guest also update the guest's %cr2 to indicate the faulting linear address. If the guest PML4 entry has the PG_PS bit set then inject a page fault into the guest with the PGEX_RSV bit set in the error_code. Get rid of redundant checks for the PG_RW violations when walking the page tables.	2014-05-24 19:13:25 +00:00
Neel Natu	a7424861fb	Check for alignment check violation when processing in/out string instructions.	2014-05-23 19:59:14 +00:00
Neel Natu	d17b5104a9	Add emulation of the "outsb" instruction. NetBSD guests use this to write to the UART FIFO. The emulation is constrained in a number of ways: 64-bit only, doesn't check for all exception conditions, limited to i/o ports emulated in userspace. Some of these constraints will be relaxed in followup commits. Requested by: grehan Reviewed by: tychon (partially and a much earlier version)	2014-05-23 05:15:17 +00:00
Neel Natu	fd949af642	Inject page fault into the guest if the page table walker detects an invalid translation for the guest linear address.	2014-05-22 03:14:54 +00:00
Neel Natu	e4c8a13d61	Add PG_U (user/supervisor) checks when translating a guest linear address to a guest physical address. PG_PS (page size) field is valid only in a PDE or a PDPTE so it is now checked only in non-terminal paging entries. Ignore the upper 32-bits of the CR3 for PAE paging.	2014-05-19 03:50:07 +00:00
John Baldwin	b3e9732a76	Implement a PCI interrupt router to route PCI legacy INTx interrupts to the legacy 8259A PICs. - Implement an ICH-comptabile PCI interrupt router on the lpc device with 8 steerable pins configured via config space access to byte-wide registers at 0x60-63 and 0x68-6b. - For each configured PCI INTx interrupt, route it to both an I/O APIC pin and a PCI interrupt router pin. When a PCI INTx interrupt is asserted, ensure that both pins are asserted. - Provide an initial routing of PCI interrupt router (PIRQ) pins to 8259A pins (ISA IRQs) and initialize the interrupt line config register for the corresponding PCI function with the ISA IRQ as this matches existing hardware. - Add a global _PIC method for OSPM to select the desired interrupt routing configuration. - Update the _PRT methods for PCI bridges to provide both APIC and legacy PRT tables and return the appropriate table based on the configured routing configuration. Note that if the lpc device is not configured, no routing information is provided. - When the lpc device is enabled, provide ACPI PCI link devices corresponding to each PIRQ pin. - Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A pins via the ELCR. - Mark the power management SCI as level triggered. - Don't hardcode the number of elements in Packages in the source for the DSDT. iasl(8) will fill in the actual number of elements, and this makes it simpler to generate a Package with a variable number of elements. Reviewed by: tycho	2014-05-15 14:16:55 +00:00
Neel Natu	f3db4c53e6	Increase the TSS limit by one byte. The processor requires an additional byte with all bits set to 1 beyond the I/O permission bitmap. Prior to this change accessing I/O ports [0xFFF8-0xFFFF] would trigger a #GP fault even though the I/O bitmap allowed access to those ports. For more details see section "I/O Permission Bit Map" in the Intel SDM, Vol 1. Reviewed by: kib	2014-05-14 22:24:09 +00:00
Neel Natu	e50ce2aa06	Add logic in the HLT exit handler to detect if the guest has put all vcpus to sleep permanently by executing a HLT with interrupts disabled. When this condition is detected the guest with be suspended with a reason of VM_SUSPEND_HALT and the bhyve(8) process will exit. Tested by executing "halt" inside a RHEL7-beta guest. Discussed with: grehan@ Reviewed by: jhb@, tychon@	2014-05-02 00:33:56 +00:00
Neel Natu	c6a0cc2e21	Some Linux guests will implement a 'halt' by disabling the APIC and executing the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET). This functionality was broken in r263780 in a way that made it impossible to kill the bhyve(8) process because it would loop forever in vm_handle_suspend(). Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from a Linux guest will appear to be hung but this is consistent with the behavior on bare metal. The guest can be rebooted by using the bhyvectl options '--force-reset' or '--force-poweroff'. Reviewed by: grehan@	2014-04-29 18:42:56 +00:00
Neel Natu	f0fdcfe247	Allow a virtual machine to be forcibly reset or powered off. This is done by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF. The disposition of VM_SUSPEND is also made available to the exit handler via the 'u.suspended' member of 'struct vm_exit'. This capability is exposed via the '--force-reset' and '--force-poweroff' arguments to /usr/sbin/bhyvectl. Discussed with: grehan@	2014-04-28 22:06:40 +00:00
Ed Maste	d1d4f00e9a	Update EFI framebuffer handoff from loader Sponsored by: The FreeBSD Foundation	2014-03-27 19:43:38 +00:00
Ed Maste	c018226f1a	amd64: Parse the EFI memory map if present With this change (and loader.efi from the projects/uefi branch) we can now boot under qemu using the OVMF UEFI firmware image with the limitation that a serial console is required. (This is largely r246337 from the projects/uefi branch.) Sponsored by: The FreeBSD Foundation	2014-03-27 18:23:02 +00:00
Neel Natu	b15a09c05e	Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called from any context i.e., it is not required to be called from a vcpu thread. The ioctl simply sets a state variable 'vm->suspend' to '1' and returns. The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The suspend handler waits until all 'vm->active_cpus' have transitioned to 'vm->suspended_cpus' before returning to userspace. Discussed with: grehan	2014-03-26 23:34:27 +00:00
Tycho Nightingale	e883c9bb40	Move the atpit device model from userspace into vmm.ko for better precision and lower latency. Approved by: grehan (co-mentor)	2014-03-25 19:20:34 +00:00
Konstantin Belousov	1fb1da0366	Add change forgotten in r263475. Make dmaplimit accessible outside amd64/pmap.c. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-21 17:17:19 +00:00
Tycho Nightingale	0775fbb475	Fix a race wherein the source of an interrupt vector is wrongly attributed if an ExtINT arrives during interrupt injection. Also, fix a spurious interrupt if the PIC tries to raise an interrupt before the outstanding one is accepted. Finally, improve the PIC interrupt latency when another interrupt is raised immediately after the outstanding one is accepted by creating a vmexit rather than waiting for one to occur by happenstance. Approved by: neel (co-mentor)	2014-03-15 23:09:34 +00:00
Tycho Nightingale	762fd20804	Replace the userspace atpic stub with a more functional vmm.ko model. New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ can be used to manipulate the pic, and optionally the ioapic, pin state. Reviewed by: jhb, neel Approved by: neel (co-mentor)	2014-03-11 16:56:00 +00:00
Roger Pau Monné	079f7ef839	xen: add a hook to perform AP startup AP startup on PVH follows the PV method, so we need to add a hook in order to diverge from bare metal. Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/machdep.c: - Add hook for start_all_aps on native (using native_start_all_aps defined in mp_machdep). amd64/amd64/mp_machdep.c: - Make some variables global because they will also be used by the Xen PVH AP startup code. - Use the start_all_aps hook to start APs. - Rename start_all_aps to native_start_all_aps. amd64/include/smp.h: - Add declaration for native_start_all_aps. x86/include/init.h: - Declare start_all_aps hook in init_ops. x86/xen/pv.c: - Pick external declarations from mp_machdep. - Introduce Xen PV code to start APs on PVH. - Set start_all_aps init hook to use the Xen PVH implementation.	2014-03-11 10:27:57 +00:00
Roger Pau Monné	4d30a3fb95	xen: use the same hypercall mechanism for XEN and XENHVM Currently XEN (PV) and XENHVM (PVHVM) ports use different ways to issue hypercalls, unify this by filling the hypercall_page under HVM also. Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/include/xen/hypercall.h: - Unify Xen hypercall code by always using the PV way. i386/i386/locore.s: - Define hypercall_page on i386 XENHVM. x86/xen/hvm.c: - Fill hypercall_page on XENHVM kernels using the HVM method (only when running as an HVM guest).	2014-03-11 10:24:13 +00:00
Roger Pau Monné	1e69553ed1	xen: implement hook to fetch and parse e820 memory map e820 memory map is fetched using a hypercall under Xen PVH, so add a hook to init_ops in oder to diverge from bare metal and implement a Xen variant. Approved by: gibbs Sponsored by: Citrix Systems R&D x86/include/init.h: - Add a parse_memmap hook to init_ops, that will be called to fetch and parse the memory map. amd64/amd64/machdep.c: - Decouple the fetch and the parse of the memmap, so the parse function can be shared with Xen code. - Move code around in order to implement the parse_memmap hook. amd64/include/pc/bios.h: - Declare bios_add_smap_entries (implemented in machdep.c). x86/xen/pv.c: - Implement fetching of e820 memmap when running as a PVH guest by using the XENMEM_memory_map hypercall.	2014-03-11 10:23:03 +00:00
Roger Pau Monné	5f05c79450	xen: implement an early timer for Xen PVH When running as a PVH guest, there's no emulated i8254, so we need to use the Xen PV timer as the early source for DELAY. This change allows for different implementations of the early DELAY function and implements a Xen variant for it. Approved by: gibbs Sponsored by: Citrix Systems R&D dev/xen/timer/timer.c: dev/xen/timer/timer.h: - Implement Xen early delay functions using the PV timer and declare them. x86/include/init.h: - Add hooks for early clock source initialization and early delay functions. i386/i386/machdep.c: pc98/pc98/machdep.c: amd64/amd64/machdep.c: - Set early delay hooks to use the i8254 on bare metal. - Use clock_init (that will in turn make use of init_ops) to initialize the early clock source. amd64/include/clock.h: i386/include/clock.h: - Declare i8254_delay and clock_init. i386/xen/clock.c: - Rename DELAY to i8254_delay. x86/isa/clock.c: - Introduce clock_init that will take care of initializing the early clock by making use of the init_ops hooks. - Move non ISA related delay functions to the newly introduced delay file. x86/x86/delay.c: - Add moved delay related functions. - Implement generic DELAY function that will use the init_ops hooks. x86/xen/pv.c: - Set PVH hooks for the early delay related functions in init_ops. conf/files.amd64: conf/files.i386: conf/files.pc98: - Add delay.c to the kernel build.	2014-03-11 10:20:42 +00:00
Roger Pau Monné	1a9cdd373a	xen: add PV/PVH kernel entry point Add the PV/PVH entry point and the low level functions for PVH early initialization. Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/genassym.c: - Add __FreeBSD_version define to assym.s so it can be used for the Xen notes. amd64/amd64/locore.S: - Make bootstack global so it can be used from Xen kernel entry point. amd64/amd64/xen-locore.S: - Add Xen notes to the kernel. - Add the Xen PV entry point, that is going to call hammer_time_xen. amd64/include/asmacros.h: - Add ELFNOTE macros. i386/xen/xen_machdep.c: - Define HYPERVISOR_start_info for the XEN i386 PV port, which is going to be used in some shared code between PV and PVH. x86/xen/hvm.c: - Define HYPERVISOR_start_info for the PVH port. x86/xen/pv.c: - Introduce hammer_time_xen which is going to perform early setup for Xen PVH: - Setup shared Xen variables start_info, shared_info and xen_store. - Set guest type. - Create initial page tables as FreeBSD expects to find them. - Call into native init function (hammer_time). xen/xen-os.h: - Declare HYPERVISOR_start_info. conf/files.amd64: - Add amd64/amd64/locore.S and x86/xen/pv.c to the list of files.	2014-03-11 10:07:01 +00:00
Roger Pau Monné	e8da1c4877	amd64/i386: switch IPI handlers to C code. Move asm IPIs handlers to C code, so both Xen and native IPI handlers share the same code. Reviewed by: jhb Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/apic_vector.S: i386/i386/apic_vector.s: - Remove asm coded IPI handlers and instead call the newly introduced C variants. amd64/amd64/mp_machdep.c: i386/i386/mp_machdep.c: - Add C coded clones to the asm IPI handlers (moved from x86/xen/hvm.c). i386/include/smp.h: amd64/include/smp.h: - Add prototypes for the C IPI handlers. x86/xen/hvm.c: - Move the C IPI handlers to mp_machdep and call those in the Xen IPI handlers. i386/xen/mp_machdep.c: - Add dummy IPI handlers to the i386 Xen PV port (this port doesn't support SMP).	2014-03-11 10:03:29 +00:00
Jung-uk Kim	1d22d877b8	Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c. Inspired by: jhb	2014-03-04 21:35:57 +00:00
Neel Natu	dc50650607	Queue pending exceptions in the 'struct vcpu' instead of directly updating the processor-specific VMCS or VMCB. The pending exception will be delivered right before entering the guest. The order of event injection into the guest is: - hardware exception - NMI - maskable interrupt In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt window-exiting and inject it as soon as possible after the hardware exception is injected. Also since interrupts are inherently asynchronous, injecting them after the hardware exception should not affect correctness from the guest perspective. Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict it to only deliver x86 hardware exceptions. This new ioctl is now used to inject a protection fault when the guest accesses an unimplemented MSR. Discussed with: grehan, jhb Reviewed by: jhb	2014-02-26 00:52:05 +00:00
Neel Natu	52e5c8a2ec	Simplify APIC mode switching from MMIO to x2APIC. In part this is done to simplify the implementation of the x2APIC virtualization assist in VT-x. Prior to this change the vlapic allowed the guest to change its mode from xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked when the virtual machine is created. This is not very constraining because operating systems already have to deal with BIOS setting up the APIC in x2APIC mode at boot. Fix a bug in the CPUID emulation where the x2APIC capability was leaking from the host to the guest. Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore MSR accesses to the vlapic when it is in xAPIC mode. The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8) can be used to change the mode to x2APIC instead. Discussed with: grehan@	2014-02-20 01:48:25 +00:00
Andriy Gapon	f25e50cf0b	provide fast versions of ffsl and flsl for i386; ffsll and flsll for amd64 Reviewed by: jhb MFC after: 10 days X-MFC note: consider thirdparty modules depending on these symbols Sponsored by: HybridCluster	2014-02-14 15:18:37 +00:00
John Baldwin	4edef187b8	Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge I/O windows, the default is to preserve the firmware-assigned resources. PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture defines a PCI_RES_BUS resource type. - Add a helper API to create top-level PCI bus resource managers for each PCI domain/segment. Host-PCI bridge drivers use this API to allocate bus numbers from their associated domain. - Change the PCI bus and CardBus drivers to allocate a bus resource for their bus number from the parent PCI bridge device. - Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the full range of bus numbers from secbus to subbus from their parent bridge. The drivers also always program their primary bus register. The bridge drivers also support growing their bus range by extending the bus resource and updating subbus to match the larger range. - Add support for managing PCI bus resources to the Host-PCI bridge drivers used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib). - Define a PCI_RES_BUS resource type for amd64 and i386. Reviewed by: imp MFC after: 1 month	2014-02-12 04:30:37 +00:00
John Baldwin	00f3efe1bd	Add support for FreeBSD/i386 guests under bhyve. - Similar to the hack for bootinfo32.c in userboot, define _MACHINE_ELF_WANT_32BIT in the load_elf32 file handlers in userboot. This allows userboot to load 32-bit kernels and modules. - Copy the SMAP generation code out of bootinfo64.c and into its own file so it can be shared with bootinfo32.c to pass an SMAP to the i386 kernel. - Use uint32_t instead of u_long when aligning module metadata in bootinfo32.c in userboot, as otherwise the metadata used 64-bit alignment which corrupted the layout. - Populate the basemem and extmem members of the bootinfo struct passed to 32-bit kernels. - Fix the 32-bit stack in userboot to start at the top of the stack instead of the bottom so that there is room to grow before the kernel switches to its own stack. - Push a fake return address onto the 32-bit stack in addition to the arguments normally passed to exec() in the loader. This return address is needed to convince recover_bootinfo() in the 32-bit locore code that it is being invoked from a "new" boot block. - Add a routine to libvmmapi to setup a 32-bit flat mode register state including a GDT and TSS that is able to start the i386 kernel and update bhyveload to use it when booting an i386 kernel. - Use the guest register state to determine the CPU's current instruction mode (32-bit vs 64-bit) and paging mode (flat, 32-bit, PAE, or long mode) in the instruction emulation code. Update the gla2gpa() routine used when fetching instructions to handle flat mode, 32-bit paging, and PAE paging in addition to long mode paging. Don't look for a REX prefix when the CPU is in 32-bit mode, and use the detected mode to enable the existing 32-bit mode code when decoding the mod r/m byte. Reviewed by: grehan, neel MFC after: 1 month	2014-02-05 04:39:03 +00:00
John Baldwin	3cbf3585cb	Enhance the support for PCI legacy INTx interrupts and enable them in the virtio backends. - Add a new ioctl to export the count of pins on the I/O APIC from vmm to the hypervisor. - Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for ISA interrupts. - Populate the MP Table with I/O interrupt entries for any PCI INTx interrupts. - Create a _PRT table under the PCI root bridge in ACPI to route any PCI INTx interrupts appropriately. - Track which INTx interrupts are in use per-slot so that functions that share a slot attempt to distribute their INTx interrupts across the four available pins. - Implicitly mask INTx interrupts if either MSI or MSI-X is enabled and when the INTx DIS bit is set in a function's PCI command register. Either assert or deassert the associated I/O APIC pin when the state of one of those conditions changes. - Add INTx support to the virtio backends. - Always advertise the MSI capability in the virtio backends. Submitted by: neel (7) Reviewed by: neel MFC after: 2 weeks	2014-01-29 14:56:48 +00:00
Neel Natu	30b94db8c0	Support level triggered interrupts with VT-x virtual interrupt delivery. The VMCS field EOI_bitmap[] is an array of 256 bits - one for each vector. If a bit is set to '1' in the EOI_bitmap[] then the processor will trigger an EOI-induced VM-exit when it is doing EOI virtualization. The EOI-induced VM-exit results in the EOI being forwarded to the vioapic so that level triggered interrupts can be properly handled. Tested by: Anish Gupta (akgupt3@gmail.com)	2014-01-25 20:58:05 +00:00
John Baldwin	e07ef9b0f6	Move <machine/apicvar.h> to <x86/apicvar.h>.	2014-01-23 20:10:22 +00:00
Neel Natu	5b8a8cd1fe	Add an API to rendezvous all active vcpus in a virtual machine. The rendezvous can be initiated in the context of a vcpu thread or from the bhyve(8) control process. The first use of this functionality is to update the vlapic trigger-mode register when the IOAPIC pin configuration is changed. Prior to this change we would update the TMR in the virtual-APIC page at the time of interrupt delivery. But this doesn't work with Posted Interrupts because there is no way to program the EOI_exit_bitmap[] in the VMCS of the target at the time of interrupt delivery. Discussed with: grehan@	2014-01-14 01:55:58 +00:00
Neel Natu	add611fd4c	Don't expose 'vmm_ipinum' as a global.	2014-01-09 03:25:54 +00:00
Neel Natu	0492757c70	Restructure the VMX code to enter and exit the guest. In large part this change hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used to enter guest context and vmx_exit_guest() is used to transition back into host context. Fix a longstanding race where a vcpu interrupt notification might be ignored if it happens after vmx_inject_interrupts() but before host interrupts are disabled in vmx_resume/vmx_launch. We now called vmx_inject_interrupts() with host interrupts disabled to prevent this. Suggested by: grehan@	2014-01-01 21:17:08 +00:00
Neel Natu	de5ea6b65e	vlapic code restructuring to make it easy to support hardware-assist for APIC emulation. The vlapic initialization and cleanup is done via processor specific vmm_ops. This will allow the VT-x/SVM modules to layer any hardware-assist for APIC emulation or virtual interrupt delivery on top of the vlapic device model. Add a parameter to 'vcpu_notify_event()' to distinguish between vlapic interrupts versus other events (e.g. NMI). This provides an opportunity to use hardware-assists like Posted Interrupts (VT-x) or doorbell MSR (SVM) to deliver an interrupt to a guest without causing a VM-exit. Get rid of lapic_pending_intr() and lapic_intr_accepted() and use the vlapic_xxx() counterparts directly. Associate an 'Apic Page' with each vcpu and reference it from the 'vlapic'. The 'Apic Page' is intended to be referenced from the Intel VMCS as the 'virtual APIC page' or from the AMD VMCB as the 'vAPIC backing page'.	2013-12-25 06:46:31 +00:00
John Baldwin	63e62d390d	Add a resume hook for bhyve that runs a function on all CPUs during resume. For Intel CPUs, invoke vmxon for CPUs that were in VMX mode at the time of suspend. Reviewed by: neel	2013-12-23 19:48:22 +00:00
John Baldwin	330baf58c6	Extend the support for local interrupts on the local APIC: - Add a generic routine to trigger an LVT interrupt that supports both fixed and NMI delivery modes. - Add an ioctl and bhyvectl command to trigger local interrupts inside a guest. In particular, a global NMI similar to that raised by SERR# or PERR# can be simulated by asserting LINT1 on all vCPUs. - Extend the LVT table in the vCPU local APIC to support CMCI. - Flesh out the local APIC error reporting a bit to cache errors and report them via ESR when ESR is written to. Add support for asserting the error LVT when an error occurs. Raise illegal vector errors when attempting to signal an invalid vector for an interrupt or when sending an IPI. - Ignore writes to reserved bits in LVT entries. - Export table entries the MADT and MP Table advertising the stock x86 config of LINT0 set to ExtInt and LINT1 wired to NMI. Reviewed by: neel (earlier version)	2013-12-23 19:29:07 +00:00
Neel Natu	f80330a820	Add a parameter to 'vcpu_set_state()' to enforce that the vcpu is in the IDLE state before the requested state transition. This guarantees that there is exactly one ioctl() operating on a vcpu at any point in time and prevents unintended state transitions. More details available here: http://lists.freebsd.org/pipermail/freebsd-virtualization/2013-December/001825.html Reviewed by: grehan Reported by: Markiyan Kushnir (markiyan.kushnir at gmail.com) MFC after: 3 days	2013-12-22 20:29:59 +00:00

1 2 3 4 5 ...

1857 commits