The correct condition is to check the number of ivhd entries fit into
the array.
Reported by: bz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31514
(cherry picked from commit 179bc5729d)
Reduce the live ranges for three variables so that they do not span the
call to PHYS_TO_VM_PAGE(). This enables the compiler to generate
slightly smaller machine code.
Reviewed by: kib, markj
(cherry picked from commit d411b285bc)
pmap_copy() is used to speculatively create mappings, so those mappings
should not have their access bit preset.
Reviewed by: kib, markj
(cherry picked from commit 325ff93274)
As follow-on work to e4b8deb222, move page table page
allocation and freeing into their own functions. Use these
functions to provide separate kernel vs. user page table page
accounting, and to wrap common tasks such as management of
zero-filled page state.
Requested by: markj, kib
Reviewed by: kib
(cherry picked from commit c2460d7cfe)
This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters. Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.
The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios. However, this change does add two new counters that
are available without PV_STATS. pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively. These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.
Reviewed by: kib
(cherry picked from commit e4b8deb222)
Add a credential to the cdev object in sysctl_vmm_create(), then check
that we have the correct credentials in sysctl_vmm_destroy(). This
prevents a process in one jail from opening or destroying the /dev/vmm
file corresponding to a VM in a sibling jail.
Add regression tests.
Reviewed by: jhb, markj
Sponsored by: The FreeBSD Foundation
(cherry picked from commit a85404906b)
When a cmpset for removing the PG_RW bit in pmap_promote_pde() fails,
there is no need to repeat the alignment, PG_A, and PG_V tests just to
reload the PTE's value. The only bit that we need be concerned with at
this point is PG_M. Use fcmpset instead.
(cherry picked from commit 3687797618)
The call to pmap_allow_2m_x_page() in pmap_enter_object() is redundant.
Specifically, even without the call to pmap_allow_2m_x_page() in
pmap_enter_object(), pmap_allow_2m_x_page() is eventually called by
pmap_enter_pde(), so the outcome will be the same. Essentially,
calling pmap_allow_2m_x_page() in pmap_enter_object() amounts to
"optimizing" for the unexpected case.
Reviewed by: kib
(cherry picked from commit b7de535288)
In a few places, on a failed compare-and-set, both the amd64 pmap and
the arm64 pmap repeat tests on bits that won't change state while the
pmap is locked. Eliminate some of these unnecessary tests.
Reviewed by: andrew, kib, markj
(cherry picked from commit e41fde3ed7)
Eliminate some unnecessary unlocking and relocking when we have to retry
the operation to avoid deadlock. (All of the other pmap functions that
iterate over a PV list already implemented retries without these same
unlocking and relocking operations.)
Reviewed by: kib, markj
(cherry picked from commit 1a8bcf30f9)
- Use malloc(9) to allocate ivhd_hdrs list. The previous assumption
that there are at most 10 IVHDs in a system is not true. A counter
example would be a system with 4 IOMMUs, and each IOMMU is related
to IVHDs type 10h, 11h and 40h in the ACPI IVRS table.
- Always scan through the whole ivhd_hdrs list to find IVHDs that has
the same DeviceId but less prioritized IVHD type.
Sponsored by: The FreeBSD Foundation
MFC with: 74ada297e8
Reviewed by: grehan
Approved by: lwhsu (mentor)
Differential Revision: https://reviews.freebsd.org/D29525
(cherry picked from commit 6fe60f1d5c)
In hw.vmm.create sysctl handler the maximum length of vm name is
VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is
only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to
allow the length of VM_MAX_NAMELEN for vm name.
Reviewed by: grehan
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31372
(cherry picked from commit df95cc76af)
This does not appear to affect code generation, at least with the
default toolchain.
Noticed because incorrect output specifications lead to false positives
from KMSAN, as the instrumentation uses them to update shadow state for
output operands.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit e54ae8258d)
While here, use designated initializers and rename some AMD iommu method
implementations to match the corresponding op names. No functional
change intended.
Reviewed by: grehan
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 41335c6b7f)
This controller supports 2.5G/1G/100MB/10MB speeds, and allows
tx/rx checksum offload, TSO, LRO, and multi-queue operation.
The driver was derived from code contributed by Intel, and modified
by Netgate to fit into the iflib framework.
Thanks to Mike Karels for testing and feedback on the driver.
Reviewed by: bcr (manpages), kbowling, scottl, erj
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30668
(cherry picked from commit 517904de5c)
There is no reason to initialize it to anything else, and this matches
initialization of the BSP. No functional change intended.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit e153745083)
KMSAN instrumentation requires thread-local storage to track
initialization state for function parameters and return values. This
buffer is accessed as part of each function prologue. It is provided by
the KMSAN runtime, which looks up a pointer in the current thread's
structure.
When KMSAN is configured, init_secondary() is instrumented, but this
means that GS.base must be initialized first, otherwise the runtime
cannot safely access curthread. Work around this by loading GS.base
before calling init_secondary(), so that the runtime can at least check
curthread == NULL and return a pointer to some dummy storage. Note that
init_secondary() still must reload GS.base after calling lgdt(), which
loads a selector into %gs, which in turn clears the base register.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 4b136ef259)
The ACPI parsing code around rid range was wrong on assuming there is
only one pair of start/end device id range. Besides, ivhd_dev_parse()
never work as supposed. The start/end rid info was always zero.
Restructure the code to build dynamic-sized tables for each IOMMU softc
holding device entries. The device entries are enumerated to find a
suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU
unit itself) are no-op from now on. There are also a minor fix on wrong
%b formatting string usage.
Tested on my EPYC 7282.
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D30827
(cherry picked from commit b5c74dfd64)
Remove deugging stuff, since it's arguably not needed in a minimal
setup. Also vlan, tuntap and gif since they can be loaded.
imp didn't include the part of the patch that removed xen guest support.
Xen guest is relatively small and has no way of being loaded.
Reviewed by: imp
PR: 229564
MFC After: 3 days
(cherry picked from commit b21f19c9e0)
The original %b description string is wrong.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp, jhb
Differential Revision: https://reviews.freebsd.org/D30805
(cherry picked from commit 210e6aec4f)
When a process has used sysarch(2) to specify descriptors for its
private LDT, upon rfork(RFMEM) descriptors are copied into the new child
process. Any updates to the descriptors are thus reflected to all other
processes sharing the vmspace. However, this is incorrect in the rather
obscure case where the child process was created before the LDT was
modified. Fix this by only modifying other processes which already
share the LDT.
Reported by: syzkaller
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 70dd5eebc0)
This is intended to be used with memory mapped IO, e.g. from
bus_space_map with no flags, or pmap_mapdev.
Use this new memory type in the map request configured by
resource_init_map_request, and in pciconf.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29692
(cherry picked from commit 5d2d599d3f)
Otherwise it is copied from the creating thread. Then, if either thread
exits, the other is left with a dangling pointer, typically resulting in
a page fault upon the next context switch.
Reported by: syzkaller
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 8cd05b8833)
We only need to ensure that interrupts are disabled when handling a
fault from iret. Otherwise it's possible to trigger the assertion
legitimately, e.g., by copying in from an invalid address.
Fixes: 4a59cbc12
Reported by: pho
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 6cda627556)