This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.
The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.
Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831
(cherry picked from commit 548a2ec49b)
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
(cherry picked from commit b792434150)
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.
This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30921
(cherry picked from commit 435754a59e)
This macro returns true if a provided virtual address is contained
in the kernel's clean submap.
In CHERI kernels, the buffer cache and transient I/O map are allocated
as separate regions. Abstracting this check reduces the diff relative
to FreeBSD. It is perhaps slightly more readable as well.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D28710
(cherry picked from commit 67932460c7)
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
(cherry picked from commit 254e4e5b77)
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
For stable/13 only, sys/arm/include/tls.h includes support for
ARM_TP_ADDRESS which is not present in main.
(cherry picked from commit 1a62e9bc00)
This is the more standard name for the bias of dtv pointers used on
other platforms. This also fixes a few other places that were using
the wrong bias previously on MIPS such as dlpi_tls_data in struct
dl_phdr_info and the recently added __libc_tls_get_addr().
Reviewed by: kib, jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33346
(cherry picked from commit 03f6b14106)
schedinit_ap() sets up an AP for a later call to sched_throw(NULL).
Currently, ULE sets up some pcpu bits and fixes the idlethread lock with
a call to sched_throw(NULL); this results in a window where curthread is
setup in platforms' init_secondary(), but it has the wrong td_lock.
Typical platform AP startup procedure looks something like:
- Setup curthread
- ... other stuff, including cpu_initclocks_ap()
- Signal smp_started
- sched_throw(NULL) to enter the scheduler
cpu_initclocks_ap() may have callouts to process (e.g., nvme) and
attempt to sched_add() for this AP, but this attempt fails because
of the noted violated assumption leading to locking heartburn in
sched_setpreempt().
Interrupts are still disabled until cpu_throw() so we're not really at
risk of being preempted -- just let the scheduler in on it a little
earlier as part of setting up curthread.
(cherry picked from commit 589aed00e3)
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 9feff969a0)
--Eliminate a big ifdef that encompassed all currently-supported
architectures except mips and powerpc32. This applied to the case
in which we've allocated a superpage but the pager-populated range
is insufficient for a superpage mapping. For platforms that don't
support superpages the check should be inexpensive as we shouldn't
get a superpage in the first place. Make the normal-page fallback
logic identical for all platforms and provide a simple implementation
of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc.
--Apply the logic for handling pmap_enter() failure if a superpage
mapping can't be supported due to additional protection policy.
Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case,
and note Intel PKU on amd64 as the first example of such protection
policy.
Reviewed by: kib, markj, bdragon
(cherry picked from commit 8dc8feb53d)
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.
Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.
PR: 259157
Reviewed by: mav, kib, markj (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32580
(cherry picked from commit 0d2224733e)
Add a boolean parameter to minidumpsys(), to indicate a live dump. When
requested, take a snapshot of important global state, and pass this to
the machine-dependent minidump function. For now this includes the
kernel message buffer, and the bitset of pages to be dumped. Beyond
this, we don't take much action to protect the integrity of the dump
from changes in the running system.
A new function msgbuf_duplicate() is added for snapshotting the message
buffer. msgbuf_copy() is insufficient for this purpose since it marks
any new characters it finds as read.
For now, nothing can actually trigger a live minidump. A future patch
will add the mechanism for this. For simplicity and safety, live dumps
are disallowed for mips.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31993
(cherry picked from commit 588ab3c774)
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Reviewed by: kib, markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31992
(cherry picked from commit 10fe6f80a6)
Don't assume we are dumping the global message buffer, but use the one
provided by the state argument. While here, drop superfluous
cast to char *.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31991
(cherry picked from commit 1d2d1418b4)
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
(cherry picked from commit 1adebe3cd6)
in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions. Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.
Deduplicate the implementations for the rest of the platforms. This
will make it easier to implement in_cksum() for unmapped mbufs. On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.
No functional change intended.
Reviewed by: kp, glebius
Sponsored by: The FreeBSD Foundation
(cherry picked from commit ecbbe83144)
It was never implemented on powerpc or riscv and appears to have been
unused since it was added in 1998. No functional change intended.
Reviewed by: kp, glebius, cy
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 09100f936b)
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit a4667e09e6)
This is intended for use in KTLS transmit where each TLS record is
described by a single mbuf that is itself queued in the socket buffer.
Using the existing CRYPTO_BUF_MBUF would result in
bus_dmamap_load_crp() walking additional mbufs in the socket buffer
that are not relevant, but generating a S/G list that potentially
exceeds the limit of the tag (while also wasting CPU cycles).
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30136
(cherry picked from commit 883a0196b6)
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
(cherry picked from commit ab4ed843a3)
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().
Reviewed by: kib, markj, imp (earlier version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31884
(cherry picked from commit 31991a5a45)
amd64 and 32-bit ARM already had assertions to this effect. Add them to
other pmaps.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit b092c58c00)
This change serves two purposes.
First, we take advantage of the compiler provided endian definitions to
eliminate some long-standing duplication between the different versions
of this header. __BYTE_ORDER__ has been defined since GCC 4.6, so there
is no need to rely on platform defaults or e.g. __MIPSEB__ to determine
endianness. A new common sub-header is added, but there should be no
changes to the visibility of these definitions.
Second, this eliminates the hand-rolled __bswapNN() routines, again in
favor of the compiler builtins. This was done already for x86 in
e6ff6154d2. The benefit here is that we no longer have to maintain our
own implementations on each arch, and can instead rely on the compiler
to emit appropriate instructions or libcalls, as available. This should
result in equivalent or better code generation. Notably 32-bit arm will
start using the `rev` instruction for these routines, which is available
on armv6+.
PR: 236920
Reviewed by: arichardson, imp
Tested by: bdragon (BE powerpc)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29012
(cherry picked from commit 720dc6bcb5)
This is intended to be used with memory mapped IO, e.g. from
bus_space_map with no flags, or pmap_mapdev.
Use this new memory type in the map request configured by
resource_init_map_request, and in pciconf.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29692
(cherry picked from commit 5d2d599d3f)
The remote protocol allows for implementations to report more specific
reasons for the break in execution back to the client [1]. This is
entirely optional, so it is only implemented for amd64, arm64, and i386
at the moment.
[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Stop-Reply-Packets.html
Reviewed by: jhb
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 51
(cherry picked from commit 7446b0888d)
Use the new kdb variants. Print more specific error messages.
Reviewed by: jhb, markj
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
(cherry picked from commit 9d81dd5404)
This basically mirrors what already exists in ddb, but provides a
slightly improved interface. It allows the caller to specify the
watchpoint access type, and returns more specific error codes to
differentiate failure cases.
This will be used to support hardware watchpoints in gdb(4).
Stubs are provided for architectures lacking hardware watchpoint logic
(mips, powerpc, riscv), while other architectures are added individually
in follow-up commits.
Reviewed by: jhb, kib, markj
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
(cherry picked from commit 763107f26c)
This is the only in-tree driver for the asymmetric crypto support in
OCF that is already marked deprecated for 14.
Sponsored by: Chelsio Communications
(cherry picked from commit 096a847216)
Commit 248f0ca converted intrcnt and intrnames from u_long[]
and char[] to u_long* and char* respectively, but for non-INTRNG mips
these symbols were defined in .S file as a pre-allocated static arrays,
so the problem wasn't cought at compile time. Conversion from an array
to a pointer requires pointer initialization and it wasn't done
for MIPS, so whatever happenned to be in the begginning of intcnt[]
array was used as a pointer value.
Move intrcnt/intrnames to C code and allocate them dynamically
although with a fixed size at the moment.
Reviewed by: emaste
PR: 253051
Differential Revision: https://reviews.freebsd.org/D28424
MFC after: 1 day
(cherry picked from commit e0a0a3efcb)
mips: fix NLM platforms breakage caused by e0a0a3ef
NetLogic platforms have their own implementation of cpu_init_interrupts.
Apply the same logic to it as to intr_machdep.c.
PR: 253051
(cherry picked from commit d6f9c5a6d2)
Use a machdep.nirq tunable intead of compile-time constant NIRQ
as a value for maximum number of interrupts. It allows keep a system
footprint small by default with an option to increase the limit
for large systems like server-grade ARM64
Reviewd by: mhorne
Differential Revision: https://reviews.freebsd.org/D27844
Submitted by: Klara, Inc.
Sponsored by: Ampere Computing
This does an import of quirk stubs, debugging macros from USB code and
numerous usage constants used by dependent drivers.
Besides, this change renames some functions to get a better matching
with userland library and NetBSD/OpenBSD HID code. Namely:
- Old hid_report_size() renamed to hid_report_size_max()
- New hid_report_size() calculates size of given report rather than
maximum size of all reports.
- hid_get_data_unsigned() renamed to hid_get_udata()
- hid_put_data_unsigned() renamed to hid_put_udata()
Compat shim functions are provided in usbhid.h to make possible compile
of legacy code unmodified after this change.
Reviewed by: manu, hselasky
Differential revision: https://reviews.freebsd.org/D27887
It will be used by the upcoming HID-over-i2C implementation. Should be
no-op, except hid.ko module dependency is to be added to affected drivers.
Reviewed by: hselasky, manu
Differential revision: https://reviews.freebsd.org/D27867
Upon exit from the debugger, checking the return code of kdb_trap()
allows one to retry the fatal page fault. This matches what is done on
all other architectures.
Reviewed by: jhb (earlier version)
Differential Revision: https://reviews.freebsd.org/D27535
These aligned the address but then always used the least significant
bits of the value in memory, which is the wrong half 50% of the time for
16-bit atomics and the wrong quarter 75% of the time for 8-bit atomics.
These bugs were all present in r178172, the commit that added the mips
port, and have remained for its entire existence to date.
Reviewed by: jhb (mentor)
Approved by: jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D27343
- Fix kernel stack unwinding end-of-function false-positive
The kernel stack unwinder assumes that any jr $ra indicates the end
of the current function. However, modern compilers generate code
that contains jr $ra at various places inside the function.
- Handle LLD inter-function padding when looking for the start of a
function.
- Use call site for symbol name/offset when unwinding
Currently we use the return address, which will normally just give
an output that's off by 8 from the actual call site. However, for
tail calls, this is particularly bad, as we end up printing the
symbol name for the function that comes after the one that made the
call. Instead we should go back two instructions from the return
address for the unwound program counter.
Submitted by: arichardson (1, 2), jrtc27 (3)
Reviewed by: arichardson
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27363
As of r365978, minidumps include a copy of dump_avail[]. This is an
array of vm_paddr_t ranges. libkvm walks the array assuming that
sizeof(vm_paddr_t) is equal to the platform "word size", but that's not
correct on some platforms. For instance, i386 uses a 64-bit vm_paddr_t.
Fix the problem by always dumping 64-bit addresses. On platforms where
vm_paddr_t is 32 bits wide, namely arm and mips (sometimes), translate
dump_avail[] to an array of uint64_t ranges. With this change, libkvm
no longer needs to maintain a notion of the target word size, so get rid
of it.
This is a no-op on platforms where sizeof(vm_paddr_t) == 8.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27082
- Validate any stack addresses read from against td_kstack before
reading. If an unwind operation would attempt to read outside the
bounds of td_kstack, abort the unwind instead.
- For stack_save_td(), don't use the PC and SP from the current
thread, instead read the PC and SP from pcb_context[].
- For stack_save(), use the current PC and SP of the current thread,
not the values from pcb_regs (the horribly named td_frame of the
outermost trapframe). The result was that stack_trace() never
logged _any_ kernel frames but only the frame from the saved
userspace registers on entry from the kernel.
- Inline the one use of stack_register_fetch().
- Add a VALID_PC() helper macro and simplify types to remove
excessive casts in stack_capture().
- Fix stack_capture() to work on compilers written in this century.
Don't treat function epilogues as function prologues by skipping
additions to SP when searching for a function start.
- Add some comments to stack_capture() and fix some style bugs.
Reviewed by: arichardson
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27358
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225