Per recent discussions on arch@ and at the BSDCan developer summit, we
are considering removing support for 32-bit platforms (in some form)
for 15.0 (at the earliest). A final decision on what will ship in
15.0 will be made closer to the release of 15.0. However, we should
communicate the potential deprecation in 14.0 to provide notice to
users.
This commit adds a warning during boot on 32-bit kernels that they are
deprecated and may be removed in 15.0. More details will be included
in a followup commit to RELNOTES.
Reviewed by: brooks, imp, emaste
Differential Revision: https://reviews.freebsd.org/D41163
As uncovered by e3ba0d6add we are copying lots of irrelevant options
from the listener to an accepted socket, even those that aren't relevant
to a non-listener, e.g. SO_REUSE*, SO_ACCEPTFILTER. Stop doing that
and provide a fixed opt-in list for options to be inherited. Ideally
we shall not inherit anything at all. For compatibility inherit a set
of options that are meaningful for a non-listening socket of a protocol
that can listen(2).
Differential Revision: https://reviews.freebsd.org/D41412
Fixes: e3ba0d6add
If a slot is freed that isn't the last one, we'll set its destructor to
NULL to indicate that it's been freed and leave a hole in the slot map.
Check osd_destructors in osd_call() to avoid dereferencing a method that
is potentially from a module that's been unloaded.
This scenario would most commonly surface when two modules are loaded
that osd_register(), then the earlier one deregisters and an osd_call()
is made after the fact. In the specific report that triggered the
investigation, kldload if_wg -> kldload linux* -> kldunload if_wg ->
destroy a jail -> panic.
Noted in the review, but left for follow-up work, is that the realloc
that may happen in osd_deregister() should likely go away and the
assumption that reallocating to a smaller size cannot fail is actually
not correct.
Reported by: dim
Reviewed by: markj, jamie
Differential Revision: https://reviews.freebsd.org/D41404
subr_rangeset.c is the only source file that calls functions like
pctrie_insert and pctrie_remove directly; other users of pctries use
the PCTRIE_DEFINE macro to define interfaces to pctrie that let them
ignore issues of offsets within structs and uint64_t return values.
Change subr_rangeset.c to use PCTRIE_DEFINE too. And change pctrie.h
to mark the lookup function as unused, to avoid warnings when
compiling files, like subr_rangeset.c, that don't invoke lookup().
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D41391
This reverts commit 96c76d9306.
There are other relatively common reasons why init might get killed
during reboot, the workaround was really not sparc64-specific.
Discussed with: marius
Sponsored by: The FreeBSD Foundation
This reverts commit 5b353925ff.
The reason is the lesser scalability of the vnode' rangelock comparing
with the vnode lock.
Requested by: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41334
Previously the log message indicated only "(core dumped)" if a core was
successfully created, or nothing if it was not. This provides
insufficient information to faciliate debugging. Dtrace is no help as
coredump() is static and we cannot find the return value via fbt.
Expand the log message to include error return value information.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39942
Early during boot, thread0 runs with td->td_ucred == NULL. This is
fixed up in proc0_init() at SI_SUB_INTRINSIC. If a panic occurs before
then, rather than dereference a NULL pointer, simply allow the thread to
enter KDB.
Reported by: stevek
Reviewed by: mhorne, stevek
MFC after: 1 week
Fixes: cab1056105 ("kdb: Modify securelevel policy")
Differential Revision: https://reviews.freebsd.org/D41280
These are modeled on the API used for m_copydata/m_copyback and the
crypto buffer APIs. One day it might be nice to reduce the
proliferation of these by adding cursors and using memdesc directly
for crypto request buffers.
Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D40615
The computation of keybarr(), the function that determines when a
search has failed at a non-leaf node, can be done in a way that
computes the 'slot' value when keybarr() fails, which is exactly when
slot() would next be invoked. Computing things this way saves space in
search loops.
This reduces the amd64 coding of the search loop in vm_radix_lookup
from 40 bytes to 28 bytes.
Reviewed by: alc
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41235
The clev field in the node struct is almost always multiplied by
WIDTH; occasionally, it is incremented and then multiplied by
WIDTH. Instructions can be saved by storing it always multiplied by
WIDTH.
For the computation of slot(), this just eliminates a
multiplication. For trimkey(), where the caller always adds one to
clev before passing it as an argument, this change has the caller, not
the caller, do that. Trimkey() handles it not by adding WIDTH to the
input parameter, but by shifting COUNT, and not 1. That produces the
same result, and it relieves keybarr of the need to test to avoid
shifting by more than 63 bits, since level is always <= 63.
This takes 3 instrutions and 14 bytes out of the basic lookup loop on
amd64.
Reviewed by: kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41226
NULL (non-leaf) pointers with NULL leaves, there is a NULL test
removed from every iteration of an index-based search loop.
This speeds up radix trie searches by few percent. If there are any
radix tries that are not initialized with the init() function, but
instead depend on zeroing everything being proper initialization, this
will break those tries.
Reviewed by: alc, kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41171
Currently for the MFS, firmware and VDSO template assembly files we pass
the path to include with .incbin unquoted and use __XSTRING within the
assembly file to stringify it. However, __XSTRING doesn't just perform a
single level of expansion, it performs the normal full expansion of the
macro, and so if the path itself happens to tokenise to something that
includes a defined macro in it that will itself be substituted. For
example, with #define MACRO 1, a path like /path/containing/MACRO/in/it
will expand to /path/containing/1/in/it and then, when stringified, end
up as "/path/containing/1/in/it", not the intended string. Normally,
macros have names that start or end witih underscores and are unlikely
to appear in a tokenised path (even if technically they could), but now
that we've switched to GNU C as of commit ec41a96daa ("sys: Switch the
kernel's C standard from C99 to GNU99.") there are a few new macros
defined which don't start or end with underscores: unix, which is always
defined to 1, and i386, which is defined to 1 on i386. The former
probably doesn't appear in user paths in practice, but the latter has
been seen to and is likely quite common in the wild.
Fix this by defining the macro pre-quoted instead of using __XSTRING.
Note that technically we don't need to do this for vdso_wrap.S today as
all the paths passed to it are safe file names with no user-controlled
prefix but we should do it anyway for consistency and robustness against
future changes.
This allows make tinderbox to pass when built with source and object
directories inside ~/path-with-unix, which would otherwise expand to
~/path-with-1 and break.
PR: 272744
Fixes: ec41a96daa ("sys: Switch the kernel's C standard from C99 to GNU99.")
Upon discovering a violation kmsan_check_arg() passes a pointer to
function parameter shadow state to kmsan_report_hook().
kmsan_report_hook() uses that address to find the origin cells, assuming
that the passed address belongs to the kernel map. This has two
problems:
1) Function parameter origin state is also located in TLS, not in the
origin map, but kmsan_report_hook() doesn't know this.
2) KMSAN TLS for thread0 is statically allocated and thus isn't shadowed
(because the kernel itself is not shadowed).
These bugs could result in inaccuracies in KMSAN reports, or a page
fault when trying to report a KMSAN violation (which by default panics
the kernel anyway).
Fix the problem by making callers of kmsan_report_hook() provide a
pointer to origin cells.
Sponsored by: The FreeBSD Foundation
When we are sending terminating signal to the group, killpg() needs
to guarantee that all group members are to be terminated (it does not
need to ensure that they are terminated on return from killpg()). The
pg_killsx change eliminates the largest window there, but still, if
a multithreaded process is signalled, the following could happen:
- thread 1 is selected for the signal delivery and gets descheduled
- thread 2 waits for pg_killsx lock, obtains it and forks
- thread 1 continue executing and terminates the process
This scenario allows the child to escape still.
Fix it by single-threading forking parent if a conflict with pg_killsx
is noted. We try to lock pg_killsx without sleeping, and failure to
acquire it means that a parallel killpg(2) is executed. Then, stop
other threads for running and in particular, receive signals, to avoid
the situation explained above.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41128
This should improve signal delivery latency and better expose the
process state to the executing threads.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41128
This reverts commits 81a37995c7 and
565a343ae3.
There is still a leakage of the p_killpg_cnt, some but not all sources
of which were identified.
Second, and more important, is that there is a fundamental issue with
blocked signals having KSI_KILLPG flag set. Queueing of such signal
increments p_killpg_cnt, but it cannot be decremented until the signal
is delivered. If, for instance, a single-threaded process with blocked
signal receives killpg-kill and executes fork(2), the fork enter check
returns with ERESTART. And since signal is blocked, the condition
cannot be cleared.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41128
To ensure atomicity of reads against parallel writes and truncates,
vnode lock was not enough at least since introduction of vn_io_fault().
That code only take rangelock when it was possible that vn_read() and
vn_write() could drop the vnode lock.
At least since the introduction of VOP_READ_PGCACHE() which generally
does not lock the vnode at all, rangelocks become required even
for filesystems that do not need vn_io_fault() workaround. For
instance, tmpfs.
PR: 272678
Analyzed and reviewed by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41158
These are no longer needed after commit c5312bd79e. This did
require adding an include of <sys/limits.h> instead for SIZE_T_MAX
which previously was dragged in via header pollution.
IFCAP2_* has the bit position and not the shifted value.
Reviewed by: kib@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D41100
Replace the implementations of lookup_le and lookup_ge with ones
that do not use a stack or climb back up the tree, and instead
exploit the popmap field to quickly identify the place to resume
searching if the straightforward indexed search fails.
The code size of the original functions shrinks by a combined 160
bytes on amd64, and the cumulative cycle count per invocation of
the two functions together is reduced 20% in a buildworld test.
Reviewed by: alc, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40936
This routine is specific to CAM and no longer assumes any internal
bus_dma knowledge as it is simple wrapper around bus_dmamap_load_mem.
Fixes: 60381fd1ee memdesc: Retire MEMDESC_CCB.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D41058
Instead, change memdesc_bio to examine the bio and return a memdesc of
a more generic type describing the data buffer.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D41029
This memory descriptor is backed by an array of VM pages. This type
requires adding a new field to 'struct memdesc' to hold the offset
within the first page. For LP64 systems, this new field is added in
an existing padding hole so does not increase the size. For ILP32
systems, this grows 'struct memdesc' from 12 to 16 bytes.
Reviewed by: imp, markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D41028
Instead, change memdesc_ccb to examine the CCB and return a memdesc of
a more generic type describing the data buffer.
Reviewed by: imp, markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D40880
Since this function ultimately calls vn_lock() or VOP_LOCK1(), allows it to
receive and pass this flag which is used in the lookup code and doesn't
interfere with the function's operation.
Reviewed by: kib, markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40954
bufobj_init depends on fields bo_dirty.bv_root and bo_clean.bv_root
being zeroed on entry and pctrie_init zeroing whatever is passed to
them, and so does not call pctrie_init for either of them. That fails
if pctrie_init ever changes to do something other that zeroing data,
so add explicit calls to them.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D40978
This reverts commits 4a402dfe0b and
3bffa22623.
The fix will be implemented in somewhat different manner. The semantic
adjustment is incompatible with linuxolator expectations.
Reported and reviewed by: dchagin
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40969
Previously, even if the FIONBIO or FIOASYNC ioctl failed, the file's
f_flags variable would still be changed. Now, kern_fcntl will restore
the original flags if the ioctl fails.
PR: 265736
Reported by: Yuval Pavel Zholkover <paulzhol@gmail.com>
MFC after: 2 weeks
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D40955
Two cases in the insert routine are written differently, when
they're really doing the same thing. Writing that case only once
saves 208 bytes in the compiled vm_radix_insert code and reduces
instructions executed by about 2%.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40807
otherwise we could end up with the livelock. When pg_killsx trylock
failed, ensure that we do wait for lock availability before retry.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week