This avoids bloating the kernel image when MAXCPU is large.
A follow-up patch for kgdb and other kernel debuggers is needed since
the stoppcbs symbol is now a pointer. Bump __FreeBSD_version so that
debuggers can use osreldate to figure out how to handle stoppcbs.
PR: 269572
MFC after: never
Reviewed by: mjg, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39806
The SPDX folks have obsoleted the BSD-2-Clause-NetBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
A signed one-bit wide bit-field can take only the values 0 and -1. Clang
16 introduced a warning that "implicit truncation from 'int' to a
one-bit wide bit-field changes value from 1 to -1". Fix the warnings by
using C99 bool.
Reported by: Clang 16
Reviewed by: emaste, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39705
Simply moving the interrupt allocation and release functions into files
which belong to the architecture. Since x86 interrupt handling is quite
distinct from other architectures, this is a crucial necessary step.
Identifying the border between x86 and architecture-independent is
actually quite tricky. Similarly, getting the prototypes for the
border right is also quite tricky.
Inspired by the work of Julien Grall <julien@xen.org>,
2015-10-20 09:14:56, but heavily adjusted.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30936
The x86 PIC interface is very much x86-specific and not used by other
architectures. Since most of xen_intr.c can be shared with other
architectures, the PIC interface needs to be broken off.
Introduce wrappers for calls into the architecture-dependent interrupt
layer. All architectures need roughly the same functionality, but the
interface is slightly different between architectures. Due to the
wrappers being so thin, all of them are implemented as inline in
arch-intr.h.
The original implementation was done by Julien Grall in 2015, but this
has required major updating.
Removal of PVHv1 meant substantial portions disappeared. The original
implementation took care of moving interrupt allocation to
xen_arch_intr.c, but this has required massive rework and was broken
off.
In the original implementation the wrappers were normal functions. Some
had empty stubs in xen_intr.c and were removed.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30909
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header. A
similar situation exists for the NR_EVENT_CHANNELS constant.
Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.
This was originally implemented by Julien Grall, but has been heavily
modified. The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around. This allows the
architecture to add function definitions which use struct xenisrc.
The original version only moved xi_intsrc into xen_arch_isrc_t. Moving
xi_vector was done by the submitter.
The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t. Those disappeared with the removal of PVHv1 support.
Copyright note. The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs. Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
- Adjust some line lengths
- Fix comment about NR_EVENT_CHANNELS after movement.
- Use #include instead of symlinks.
Part of the series for allowing FreeBSD/ARM to run on Xen. On ARM the
function is a trivial pass-through, other architectures need distinct
implementations.
While implementing XEN_VCPUID() as a call to XEN_CPUID_TO_VCPUID()
works, that involves multiple accesses to the PCPU region. As such make
this a distinct macro. Only callers in machine independent code have
been switched.
Add a wrapper for the x86 PIC interface to use matching the old
prototype.
Partially inspired by the work of Julien Grall <julien@xen.org>,
2015-08-01 09:45:06, but XEN_VCPUID() was redone by Elliott Mitchell on
2022-06-13 12:51:57.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 08:57:40
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 14:32:01
Differential Revision: https://reviews.freebsd.org/D29404
In 38d1ac34ff SIGATOMIC_{MIN,MAX} were
defined in terms of LONG_{MIN,MAX}. Later, they were switched to
__LONG_{MIN,MAX} in 78fe75bc28 where an
include of machine/_limits.h was added. Switch to using fixed width
INT64_{MIN,MAX} and remove the header pollution.
No functional change.
Reviewed by: theraven, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D39196
When the cycle counter is "stable", i.e., synchronized across vCPUs by
the hypervisor, userspace can use a serialized rdtsc instead of relying
on rdtscp, just like the kernel timecounter does. This can be useful
for performance in guests where the hypervisor hides rdtscp for some
reason.
To avoid breaking compatibility with older userspace which expects
rdtscp to be usable when pvclock exports timekeeping info, hide this
feature behind a sysctl.
Reviewed by: kib
Tested by: Shrikanth R Kamath <kshrikanth@juniper.net>
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38342
This covers all currently defined bits, adding PKRU and TILE.
Reviewed by: jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38219
Linux reads MISC_FEATURES_ENABLES to manage the CPUID faulting feature
(undocumented in the Intel SDM, but documented in 323850-004 (Intel
Virtualization Technology FlexMigration Application Note). Since bhyve
doesn't emulate this feature, we always return 0. Neither does bhyve
support the MONITOR/MWAIT fault bit also in this MSR (which is
documented in the sdm), so always return 0.
Sponsored by: Netflix
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D36602
MAX_APIC_ID must be at least twice MAXCPU. Increase it to 0x800 so that
it is possible to set MAXCPU to 512 or 1024 in a custom kernel config
file.
Note that increasing this limit does not itself cause any allocations
to be larger; it just allows madt_parse_cpu() to process higher APIC
IDs.
APIC IDs may be sparse and so we can waste memory. This is independent
of this change, but becomes more of an issue as the maximum APIC ID
grows. This should be addressed with future work.
Reviewed by: royger
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37067
They were copy-pasted when x86/include/elf.h file was merged from its
i386 and amd64 counterparts. Having the text around inclusions
significantly different is somewhat confusing.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37085
both for i386 native and compat32 amd64. We know the ld-elf.so.1 size
in advance, it fits there. Trying to push it up after the end of a
binary cannot work reliably and eventually fail for large binaries.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37085
and not on the trampoline stack. This is a useful way to ensure that
we did not enabled interrupts while on user %cr3 or trampoline stack.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
These files no longer depend on the macros required when these checks
were added.
PR: 263102 (exp-run)
Reviewed by: brooks, imp, emaste
Differential Revision: https://reviews.freebsd.org/D34804
All supported compilers (modern versions of GCC and clang) support
this.
Many places didn't have an #else so would just silently do the wrong
thing. Ancient versions of icc (the original motivation for this) are
no longer a compiler FreeBSD supports.
PR: 263102 (exp-run)
Reviewed by: brooks, imp
Differential Revision: https://reviews.freebsd.org/D34797
This register set contains the values of the fsbase and gsbase
registers. Note that these registers can already be controlled
individually via ptrace(2) via MD operations, so the main reason for
adding this is to include these register values in core dumps. In
particular, this will enable looking up the value of TLS variables
from core dumps in gdb.
The value of NT_X86_SEGBASES was chosen to match the value of
NT_386_TLS on Linux. The notes serve similar purposes, but FreeBSD
will never dump a note equivalent to NT_386_TLS (which dumps a single
segment descriptor rather than a pair of addresses) and picking a
currently-unused value in the NT_X86_* range could result in a future
conflict.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D34650
This lets us use the TSC to implement early DELAY, limiting the use of
the sometimes-unreliable 8254 PIT.
PR: 262155
Reviewed by: emaste
Tested by: emaste, mike tancsa <mike@sentex.net>, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34367
All supported Xen instances by FreeBSD provide a local APIC
implementation, so there's no need to replace the native local APIC
implementation anymore.
Leave just the ipi_vectored hook in order to be able to override it
with an implementation based on event channels if the underlying local
APIC is not virtualized by hardware. Note the hook cannot use ifuncs,
because at the point where ifuncs are resolved the kernel doesn't yet
know whether it will benefit from using the optimization.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D33917
Instead of using event channels or hypercalls to deal with IPIs and
NMIs.
Using a hardware virtualized APIC should be faster than using any PV
interface, since the VM exit can be avoided.
Xen exposes whether the domain is using hardware assisted x{2}APIC
emulation in a CPUID bit.
Sponsored by: Citrix Systems R&D
When running as a Xen guest it's easier to use an hypercall in order
to do power management operations (power off, power cycle). Do this
for all supported guest types (HVM and PVH). Note that for HVM the
power operation could also be done using ACPI, but there's no reason
to differentiate between PVH and HVM.
While there fix the shutdown handler to properly differentiate between
power cycle and power off requests.
Reported by: Freddy DISSAUX
MFC: 1 week
Sponsored by: Citrix Systems R&D
- Move busdma_lock_mutex to subr_bus_dma.c.
- Move _busdma_lock_dflt to subr_bus_dma.c. This function was named a
couple of different things previously. It is not a public API but
an internal helper used in place of a NULL pointer. The prototype
is in <sys/bus_dma.h> as not all backends include
<sys/bus_dma_internal.h>.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33694
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
This ensures that LAPIC calibration is done using the correct tsc_freq
value, i.e., the one associated with the TSC timecounter. It does mean
though that TSC calibration cannot use sbinuptime() to read the
reference timecounter, as timehands are not yet set up.
Reviewed by: kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33209
This ensures that we have a good reference timecounter for performing
calibration.
Change lapic_setup to avoid configuring the timer when booting, and move
calibration and initial configuration to a new lapic routine,
lapic_calibrate_timer. This calibration will be initiated from
cpu_initclocks(), before an eventtimer is selected.
Reviewed by: kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33206
The headers were mostly identical on amd64 and i386.
No functional change intended.
Reviewed by: cperciva, mav, imp, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33205
It stopped being used in 3c256f5395, when trap() was reorganized to
have separate switch statements for user and kernel traps. Remove the
two leftover references and the flag itself.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D33253
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
We use pmap_invalidate_cpu_mask() to get the set of active CPUs. This
(32-byte) set is copied by value through multiple frames until we get to
smp_targeted_tlb_shootdown(), where it is copied yet again.
Avoid this copying by having smp_targeted_tlb_shootdown() make a local
copy of the active CPUs for the pmap, and drop the cpuset parameter,
simplifying callers. Also leverage the use of the non-destructive
CPU_FOREACH_ISSET to avoid unneeded copying within
smp_targeted_tlb_shootdown().
Reviewed by: alc, kib
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32792
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
This reverts commit 0f6829488e.
Also it changes the type of md_usr_fpu_save struct mdthread member
to void *, which is what uncovered this trouble. Now the save area
is untyped, but since it is hidden behind accessors, it is not too
significant. Since apparently there are consumers affected outside
the tree, this hack is better than one from the reverted revision.
PR: 258678
Reported by: cy
Reviewed by: cy, kevans, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32060
when compiling in amd64 kernel environment with -m32. This is a temporal
workaround for some future proper (but unclear) fix.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
Add vDSO support for timekeeping devices that support the KVM/XEN
paravirtual clock API.
Also, expose, in the userspace-accessible '<machine/pvclock.h>',
definitions that will be needed by 'libc' to support
'VDSO_TH_ALGO_X86_PVCLK'.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
Add support for the KVM paravirtual clock device.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29733
Consolidate more hypervisor-agnostic functionality behind a new 'struct
pvclock' API.
This should also make it easier to subsequently add hypervisor-agnostic
vDSO timekeeping support.
Also, perform some clean-up:
- Remove 'pvclock_get_last_cycles()'; do not allow external access
to 'pvclock_last_systime' since this is not necessary.
- Consolidate/simplify wall and system time reading codepaths.
- Ensure correct ordering within wall and system time reading
codepaths via 'atomic(9)' and 'rdtsc_ordered()' rather than via
'rmb()'.
- Remove some extra newlines.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc'
appropriate for the invoking x86 variant.
Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed
by 'rdtsc_ordered()' for general use.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31416