Commit graph

19886 commits

Author SHA1 Message Date
Gleb Smirnoff
be7c095ac9 unix/dgram: bump maximum datagram size limit to 8k
This is important for wpa_supplicant operation on a crowded network.

Note: we actually need an API to increase maximum datagram size on a
socket.  Previously SO_SNDBUF magically acted like that, but that was
an undocumented "feature".

Also move the comment to the proper line.  Previously it was the receive
buffer that imposed the limit.  Now notion of buffer size and maximum
datagram are separate.

Reviewed by:		bz, tuexen, karels
Differential Revision:	https://reviews.freebsd.org/D42830
PR:			274990
2023-12-01 15:37:29 -08:00
Gleb Smirnoff
0fac350c54 sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912, use same approach
for two simplier syscalls that return socket addresses.  Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.

Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D42694
2023-11-30 08:31:10 -08:00
Gleb Smirnoff
cfb1e92912 sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.

While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient.  Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do.  Linux also does that. Update tests accordingly.

Reviewed by:		rscheff, tuexen, zlei, dchagin
Differential Revision:	https://reviews.freebsd.org/D42635
2023-11-30 08:30:55 -08:00
Jamie Gritton
ed31b3f4a1 jail: Don't allow jail_set(2) to resurrect dying jails.
Currently, a prison in "dying" state (removed but still holding
resources) can be brought back to alive state via "jail -d", or
the JAIL_DYING flag to jail_set(2).  This seemed like a good idea
at the time.

Its main use was to improve support for specifying the jid when
creating a jail, which also seemed like a good idea at the time.
But resurrecting a jail that was partway through thr process of
shutting down is trouble waiting to happen.

This patch deprecates that flag, leaving it as a no-op for creating
jails (but still useful for looking at dying jails).  It sill allows
creating a new jail with the same jid as a dying one, but will renumber
the old one in that case.  That's imperfect, but allows for current
behavior.

Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D28150
2023-11-29 16:12:13 -08:00
Konstantin Belousov
c5405d1c85 vn_copy_file_range(): provide ENOSYS fallback to vn_generic_copy_file_range()
Reviewed by:	markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42603
2023-11-28 19:32:53 +02:00
Konstantin Belousov
a9bc863769 vn_copy_file_range(): find write vnodes on which to call the VOP
Reviewed by:	markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42603
2023-11-28 19:32:53 +02:00
Konstantin Belousov
4cbe4c48a7 VFS: add VOP_GETLOWVNODE()
It is similar to VOP_GETWRITEMOUNT(), and for given vnode vp should
return the lower vnode which would actually handle write to vp.
Flags allow to specify FREAD or FWRITE for benefit of possible unionfs
implementation.

Reviewed by:	markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42603
2023-11-28 19:32:53 +02:00
Konstantin Belousov
171f0832c5 EVFILT_TIMER: intialize stop timer list in type-stable proc init, instead of fork
Since kqueue timer may exist after the process that created it exited
(same scenario with rfork(2) as in PR 275286), make the tailq
p_kqtim_stop accessed by filt_timerdetach() type-stable.

Noted and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42777
2023-11-28 19:29:58 +02:00
Konstantin Belousov
ed410b78ed EVFILT_SIGNAL: do not use target process pointer on detach
It is enough to know knlist to remove from it, and the list is
autodestroyed on last removal.

PR:	275286
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42777
2023-11-28 19:29:58 +02:00
Konstantin Belousov
877ef68532 Revert "kqueue: on process exit, force-clear its registered signal events"
This reverts commit 393ac29f0b.  A
different fix is following, which preserves semantic, required by the
sys.kqueue.proc3_test.proc3 test.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
PR:	275286
Differential revision:	https://reviews.freebsd.org/D42777
2023-11-28 19:29:58 +02:00
Mateusz Guzik
e1e847374b Add DEBUG_POISON_POINTER
If you have a pointer which you know points to stale data, you can
fill it with junk so that dereference later will trap

Reviewed by:	kib
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D40946
2023-11-28 16:33:46 +00:00
Warner Losh
fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Warner Losh
5b31cc94b1 sccs: Manual changes
For the uncommon items: Go through the tree and remove sccs tags that
didn't fit any nice pattern. If in the neighborhood, other SCM tags were
removed when they were detritis of long-ago CVS somehow in the early
mists of the project. Some adjacent copyrights stringswere removed (they
duplicated the copyright notices in the file). This also removed
non-standard formations of omission of SCCS tags (usually by adding an
extra #if 0 somewhere.

After this commit, a number of strings tagged with the 'what' @(#)
prefix remain, but they are primarily copyright notices.

Sponsored by:		Netflix
2023-11-26 22:23:58 -07:00
Warner Losh
29363fb446 sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
John Baldwin
ed88eef140 new-bus: Disable assertions for rman mismatches for activate/deactivate
Bus drivers which use an rman to sub-divide a resource allocated from
a parent bus should handle mapping requests (and activate/deactivate
requests) for those sub-allocated resources by doing a subset mapping
of the resource allocated from the parent (and then using this to
handle activate/deactivate requests).

However, not all bus drivers which use internal rmans (such as acpi(4)
and pci_pci(4)) do that since not all nexus drivers support
bus_map/unmap.  Eventually bus drivers should be updated to do this
properly at which point these assertions can be reenabled.

Reported by:	delphij, kib
2023-11-25 10:32:19 -08:00
John Baldwin
46971d38de new-bus: Add comments for resource_*_map_request*
Requested by:	mhorne
2023-11-24 10:33:57 -08:00
John Baldwin
00b3cde596 new-bus: Add a comment for bus_generic_get_domain 2023-11-24 10:33:57 -08:00
John Baldwin
751615c538 newbus: Add a set of bus resource helpers for nexus-like devices
These routines can be used to implement
bus_alloc/adjust/activate/deactive/release_resource on bus drivers
which suballocate resources from rman(9) resource managers.

These methods require a new bus_get_rman method in the bus driver to
return the suitable rman for a given resource type.  The
activate/deactivate helpers also require the bus to implement the
bus_map/ummap_resource methods.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D42739
2023-11-24 09:28:00 -08:00
Konstantin Belousov
393ac29f0b kqueue: on process exit, force-clear its registered signal events
Normally, process already has all its kqueue fds destroyed at the moment
p_klist is detached in exit flow. But, if the process was created with
rfork(2) with shared file descriptors, its signal knotes can survive.
Then, knlist_detach() does not destroy non-empty knlist. Later, when
owning kqueue is closed, we access freed (or rather, reused, because
struct proc is type-stable) memory by referencing p->p_klist from such
knote.

Handle this situation by deleting all knotes hanging from p_klist.

PR:	275286
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42745
2023-11-24 18:26:53 +02:00
John Baldwin
19f073c612 new-bus: Add resource_validate_map_request function
This helper function for BUS_MAP_RESOURCE performs common argument
validation.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D42723
2023-11-23 09:06:24 -08:00
Mitchell Horne
4e78a766f6 kern_reboot(): don't clear kdb_active
It is possible to reach this function from ddb via the "reset" command.
When this happens, we don't actually exit kdb, meaning we never execute
the latter steps of kdb_break() to restore the system state (e.g.
re-enable scheduler).

Therefore, we should not clear the kdb_active flag in this function, as
the debugger is still active. Put differently, kern_reboot() is not an
authority on kdb state, and should not touch it. The original motivation
for this assignment is not clear; I have checked thoroughly and I am
convinced it is not required by any reset code.

This fixes an edge case where a panic can be triggered during reset from
ddb:
 1. Enter ddb via keyboard break sequence (KERNEL_PANICKED() == false &&
    td->td_critnest > 0)
 2. Execute the "reset" command
 3. kern_reboot() sets kdb_active = false
 4. A witness_checkorder() call via shutdown handler sees !kdb_active
    and panics

Reviewed by:	imp, markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D42684
2023-11-23 12:07:43 -04:00
Mitchell Horne
960612a19f shutdown: tweak kproc/kthread shutdown check
This is to handle the case where the system has not panicked but the
debugger is active, where we still can't wait for thread termination.

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D42683
2023-11-23 12:07:43 -04:00
Mitchell Horne
d79a9edb5c alq, siftr: add panic/debugger checks to shutdown hooks
Don't try to gracefully terminate the pkt_manager thread if the
scheduler is not running.

We should not attempt to shutdown ald if RB_NOSYNC is set, and must not
if the scheduler is stopped (the function calls wakeup()).

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D42340
2023-11-23 12:07:42 -04:00
Rick Macklem
f5f277728a nfsd: Fix NFS access to .zfs/snapshot snapshots
When a process attempts to access a snapshot under
/<dataset>/.zfs/snapshot, the snapshot is automounted.
However, without this patch, the automount does not
set mnt_exjail, which results in the snapshot not being
accessible over NFS.

This patch defines a new function called vfs_exjail_clone()
which sets mnt_exjail from another mount point and
then uses that function to set mnt_exjail in the snapshot
automount.  A separate patch that is currently a pull request
for OpenZFS, calls this function to fix the problem.

PR:	275200
Reviewed by:	markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D42672
2023-11-23 07:23:33 -08:00
Alexander Motin
0a7139485c Replace random sbuf_printf() with cheaper cat/putc. 2023-11-22 17:27:17 -05:00
Olivier Certner
7fa08d4152 kern_racct.c: Don't compile if RACCT undefined
Just skip compiling this file if RACCT isn't defined.  This allows to
skip including headers that no code uses at all, and also to remove the
whole file's #ifdef/#endif bracketing.

Reviewed by:    markj
MFC after:      2 weeks
Sponsored by:   The FreeBSD Foundation
2023-11-22 14:17:17 -05:00
Olivier Certner
e0205aa325 kern_rctl.c: Minimal includes when RCTL not defined
If RCTL is not defined, only the system call stubs returning ENOSYS are
compiled in.  In this case, don't waste time including most headers
since their code is not used.

Reviewed by:    markj
MFC after:      2 weeks
Sponsored by:   The FreeBSD Foundation
2023-11-22 14:17:17 -05:00
Olivier Certner
9d882de2da Remove sysctl 'kern.smp.forward_signal_enabled'
It seems this was an "emergency" knob to revert a newly introduced
behavior.  Overall, we want better system-wide signal receive latency,
and it doesn't seem that some contrary policy was ever needed (and if
that comes up, it should rather be implemented, e.g., per-process).

Suggested by:           kib
Reviewed by:            kib, jhb
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42315
2023-11-21 13:25:34 -05:00
Konstantin Belousov
26b36a64be sysctl kern.supported_archs: return correct value
in case COMPAT_FREEBSD32 was enabled in config but hardware does not
support executing 32bit binaries.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42641
2023-11-21 13:56:06 +02:00
Mateusz Guzik
305a2676ae vfs: dodge locking for lseek(fd, 0, SEEK_CUR)
It is very common and according to dtrace while running poudriere almost
all calls with SEEK_CUR pass 0.
2023-11-19 22:25:45 +00:00
John Baldwin
3eed4803f9 vfs mount: Consistently use ENODEV internally for an invalid fstype
Change vfs_byname_kld to always return an error value of ENODEV to
indicate an unsupported fstype leaving ENOENT to indicate errors such
as a missing mount point or invalid path.  This allows nmount(2) to
better distinguish these cases and avoid treating a missing device
node as an invalid fstype after commit 6e8272f317.

While here, change mount(2) to return EINVAL instead of ENODEV for an
invalid fstype to match nmount(2).

PR:		274600
Reviewed by:	pstef, markj
Differential Revision:	https://reviews.freebsd.org/D42327
2023-11-18 11:08:34 -08:00
Brooks Davis
54d487c4d0 makesyscalls: don't make syscall.mk by default
We only want to produce syscall.mk for the main syscall table so default
to not producing it (send it to /dev/null) and add a syscalls.conf to
sys/kern to trigger the creation of sys/sys/syscall.mk.  This eliminates
the need for entries in other syscalls.conf files and is a cleaner
pattern going forward.

Reviewed by:	kevans, imp
Differential Revision:	https://reviews.freebsd.org/D42663
2023-11-18 00:48:14 +00:00
Mike Karels
415c1c748d khelp: suppress useless warning message on shutdown
If a module (e.g. the ertt hhook for TCP) can't clean up at
shutdown, there is nothing to be done about it.  In the ertt case,
cleanup is just shutting down a UMA zone, which doesn't need to be
done.  Suppress EBUSY warnings on shutdown.

PR:		271677
Reviewed by:	tuexen, imp
Differential Revision:	https://reviews.freebsd.org/D42650
2023-11-17 12:51:18 -06:00
Konstantin Belousov
22bac49b09 vn_lock_pair(): reasonably handle vp1 == vp2 case
Lock the vnode in the most exclusive lock mode requested, once.
All callers already ensure that vp1 != vp2 or are careful enough to only
unlock once otherwise.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42642
2023-11-17 03:51:41 +02:00
Konstantin Belousov
e256f71389 kernel: add missed FEATUREs compat_freebsd 8-14
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-11-17 00:04:55 +02:00
Jonathan T. Looney
884eeff20c genoffset.sh: fix build break on MacOS
Switch from using the shell's builtin echo command to using the
builtin printf command to print the asserts.

Reported by:	jrtc27
Suggested by:	imp
Fixes:	accfb4cc93
Sponsored by:	Netflix
2023-11-16 17:54:28 +00:00
Jonathan T. Looney
accfb4cc93 genoffset.sh: stop using a temporary file
Instead, use a here document for the input. This allows us to run the
while loop in the main script so we can build the list of asserts in
a shell variable. We then print out the list of asserts at the end of
the loop.

Reviewed by:	imp
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D42407
2023-11-16 15:02:32 +00:00
John Baldwin
9b57e30cf5 abort2: Generate a core dump
Call sigexit rather than exit1 so that a core is generated.

If running the SIGABRT handler is desired, this would need to use
kern_psignal() instead.  In that case a userspace wrapper in libc
would be needed to force an exit if the handler doesn't exit.  Given
that abort2(2)'s intended use case is when userland is in a
sufficiently bad state such that it can't safely call syslog(3) before
abort(3), a userspace abort2(3) wrapper in libc might be dubious.

Reviewed by:	Olivier Certner <olce.freebsd@certner.fr>, emaste
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D42163
2023-11-13 17:00:52 -08:00
Konstantin Belousov
23210f538a vn_copy_file_range(): busy both in and out mp around call to VOP_COPY_FILE_RANGE()
This is required e.g. for nullfs to ensure liveness of the lower mount
points.

Reviewed by:	jah, rmacklem, Olivier Certner <olce.freebsd@certner.fr>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42554
2023-11-14 00:26:34 +02:00
Konstantin Belousov
89188bd6ba vn_copy_file_range(): use local variables for invp/outvp vnodes v_mounts
This avoids possible NULL dereference when checking mnt_vfc names.

Reviewed by:	jah, rmacklem, Olivier Certner <olce.freebsd@certner.fr>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42554
2023-11-14 00:26:28 +02:00
Brooks Davis
f64a688dfd Remove gratuitous copyouts of unchanged struct mac.
The get operations change the data pointed to by the structure, but do
not update the contents of the struct.

Mark the struct mac arguments of mac_[gs]etsockopt_*label() and
mac_check_structmac_consistent() const to prevent this from changing
in the future.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14488
2023-11-13 21:32:15 +00:00
Bojan Novković
c6d7be2148 tty: properly check character position when handling IUTF8 backspaces
The tty_rubchar() code handling backspaces for UTF-8 characters didn't
properly check whether the beginning of the current line was reached.
This resulted in a kernel panic in ttyinq_unputchar() when prodded with
certain malformed UTF-8 sequences.

Fixes:		PR 275009
Reviewed by:	christos
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42564
2023-11-13 20:04:11 +02:00
Warner Losh
20f8814cd3 busdma: On systmes that use subr_busdma_bounce, measure deferred time
Measure the total deferred time (from the time we decide to defer until
we try again) for busdma_load requests. On systems that don't ever
defer, there is no performnce change. Add new sysctl
hw.busdma.zoneX.total_deferred_time to report this (in
microseconds).

Normally, deferrals don't happen in modern hardware... Except there's a
lot of buggy hardware that can't cope with memory > 4GB or that can't
cross a 4GB boundary (or even more restrictive values), necessitating
bouncing. This will measure the effect on the I/Os of this deferral.

Sponsored by:		Netflix
Reviewed by:		gallatin, mav
Differential Revision:	https://reviews.freebsd.org/D42550
2023-11-13 07:23:53 -07:00
Andrew Turner
eb32c1c75a sysent: Add sv_protect
To allow for architecture specific protections add sv_protect to struct
sysent. This can be used to apply these after the executable is loaded
into the new address space.

Reviewed by:	kib
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42440
2023-11-10 09:57:45 +00:00
Andrew Turner
a04633cef8 imgact_elf: Export __elfN(parse_notes)
This is useful to check if a note is present and contains an expected
value, e.g. to read NT_GNU_PROPERTY_TYPE_0 on arm64 to see if we should
enable BTI.

Reviewed by:	kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42439
2023-11-10 09:57:45 +00:00
Andrew Turner
9d2612fc2a imgact_elf: Move GNU_ABI_VENDOR to a common header
Move the definition of GNU_ABI_VENDOR to a common location so it can
be used in multiple files.

Reviewed by:	emaste, kib, imp
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42442
2023-11-10 09:57:45 +00:00
Alexander Motin
a03c23931e uma: Improve memory modified after free panic messages
- Pass zone pointer to trash_ctor() and report zone name in the panic
message.  It may be difficult to figyre out zone just by the item size.
 - Do not pass user arguments to internal trash calls, pass thezone.
 - Report malloc type name in the same unified panic message.
 - Report corruption offset from the beginning of the items instead of
the full pointer.  It makes panic message shorter and more readable.
2023-11-09 19:46:26 -05:00
Konstantin Belousov
ede4c412b3 vfs_domount_update(): ensure that 'goto end' works
We need to vfs_op_enter()/vn_seqc_write_start() before jumping to
cleanup.

PR:	274992
Reported by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Fixes:	9ef7a491a4
2023-11-09 22:18:47 +02:00
Alexander Motin
1f8a5187ff ktls: Remove unneeded vm/uma_dbg.h include
It was used in original implementation, but is no longer.

MFC after:	2 weeks
2023-11-09 13:53:07 -05:00
Zhenlei Huang
ecf710f0e0 kern linker: Do not retry loading modules on EEXIST
LINKER_LOAD_FILE() calls linker_load_dependencies() which will return
EEXIST in case the module to be loaded has already been compiled into
the kernel. Since the format of the module is now recognized then there
is no need to retry loading with a different linker, otherwise the
userland will get misleading error number ENOEXEC.

PR:		274936
Reviewed by:	dfr
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D42474
2023-11-07 12:45:25 +08:00