Commit graph

1692 commits

Author SHA1 Message Date
Gleb Smirnoff
17083b94a9 netlink: use protocol specific receive buffer
Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's,
same part of struct sockbuf that is used for send buffer already.
This shaves a lot of code and a lot of extra processing.  The pcb rids
of the I/O queues as the socket buffer is exactly the queue.  The
message writer is simplified a lot, as we now always deal with linear
buf.  Notion of different buffer types goes away as way as different
kinds of writers.  The only things remaining are: a socket writer and
a group writer.
The impact on the network stack is that we no longer use mbufs, so
a workaround from d187154750 disappears.

Note on message throttling.  Now the taskqueue throttling mechanism
needs to look at both socket buffers protected by their respective
locks and on flags in the pcb that are protected by the pcb lock.
There is definitely some room for optimization, but this changes tries
to preserve as much as possible.

Note on new nl_soreceive().  It emulates soreceive_generic().  It
must undergo further optimization, see large comment put in there.

Note on tests/sys/netlink/test_netlink_message_writer.py. This test
boiled down almost to nothing with mbufs removed.  However, I left
it with minimal functionality (it basically checks that allocating N
bytes we get N bytes) as it is one of not so many examples of ktest
framework that allows to test KPIs with python.

Note on Linux support. It got much simplier: Netlink message writer
loses notion of Linux support lifetime, it is same regardless of
process ABI.  On socket write from Linux process we perform
conversion immediately in nl_receive_message() and on an output
conversion to Linux happens in in nl_send_one(). XXX: both
conversions use M_NOWAIT allocation, which used to be the case
before this change, too.

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D42524
2024-01-02 13:04:01 -08:00
Mark Johnston
b9924c202f linux: Check for copyout errors in ioctl handlers
In preparation for annotating copyin() and friends with
__result_use_check.

Reviewed by:	dchagin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D43199
2023-12-27 10:13:15 -05:00
Gleb Smirnoff
f27aff8f7f linux/netlink: don't override sopt level
This override effectively prevents correct entering of netlink
protocol specific pr_ctloutput in sosetopt().

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D42520
2023-12-26 20:21:58 -08:00
Gleb Smirnoff
0fac350c54 sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912, use same approach
for two simplier syscalls that return socket addresses.  Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.

Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D42694
2023-11-30 08:31:10 -08:00
Gleb Smirnoff
cfb1e92912 sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.

While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient.  Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do.  Linux also does that. Update tests accordingly.

Reviewed by:		rscheff, tuexen, zlei, dchagin
Differential Revision:	https://reviews.freebsd.org/D42635
2023-11-30 08:30:55 -08:00
Kristof Provost
ab393e9548 netlink: move NETLINK define to opt_global.h
Move the NETLINK define into opt_global.h so we can rely on it being
set correctly, without having to remember to include opt_netlink.h.
This ensures that the NETLINK define is correctly set. If not we
may end up with unloadable modules, due to missing symbols (such as
nlmsg_get_group_writer).

PR:		274306
Reviewed by:	imp, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D42179
2023-10-13 09:23:47 +02:00
Dmitry Chagin
1ee29160c5 linux(4): Fix semid64_ds structure layout
Unlike x86_64, other 64-bit architectures do not have paddings
for the time fields.

MFC after:		1 week
2023-10-04 21:21:12 +03:00
Dmitry Chagin
fbb3f13b15 linux(4): Actually delete linux_sysproto.h
MFC after:		1 week
2023-10-03 11:26:17 +03:00
Dmitry Chagin
199e397e9b linux(4): Deorbit linux_nosys
Differential Revision:	https://reviews.freebsd.org/D41901
MFC after:		1 week
2023-10-03 10:38:03 +03:00
Dmitry Chagin
0a16d3d14d linux(4): Update syscalls.master to 6.5
MFC after:		1 week
2023-09-25 12:24:58 +03:00
Dmitry Chagin
227d01c1bc linux(4): On Linux SIGKILL can not be reset to default
MFC after:		1 week
2023-09-18 17:53:01 +03:00
Dmitry Chagin
794328fbc1 linux(4): Staticize lsiginfo_to_siginfo
It's not used outside of linux_signal.c
While here fix the indentation.

MFC after:		1 week
2023-09-18 17:52:43 +03:00
Dmitry Chagin
2a1cf1b6b5 linux(4): Deduplicate mmap2
To help porting the Linux emulation layer to a new platforms start using
Linux names for conditional builds instead of architecture-specific ifdefs.

MFC after:		1 week
2023-09-05 21:16:39 +03:00
Dmitry Chagin
553b1a4e4e linux(4): Deduplicate mprotect, madvise
MFC after:		1 week
2023-09-05 21:15:52 +03:00
Vico Chen
aadc14bceb linux(4): Convert flags in timerfd_create
The timerfd is introduced in FreeBSD 14, and the Linux ABI timerfd is
also moved to FreeBSD native timerfd, but it can't work well as Linux
TFD_CLOEXEC and TFD_NONBLOCK haven't been converted to FreeBSD
TFD_CLOEXEC and TFD_NONBLOCK.

Reviewed by:		dchagin, jfree
Differential revision:	https://reviews.freebsd.org/D41708
MFC after:		1 week
2023-09-05 11:53:02 +03:00
Dmitry Chagin
11e37048db linux(4): Return ENOTSUP from listxattr instead of EPERM
FreeBSD does not permits manipulating extended attributes in the system
namespace by unprivileged accounts, even if account has appropriate
privileges to access filesystem object.
In Linux the system namespace is used to preserve posix acls. Some Gnu
coreutils binaries uses posix acls, eg, install, ls, cp.  And fails if
we unexpectedly return EPERM error from xattr system calls.

In the other hands, in Linux read and write access to the system
namespace depend on the policy implemented for each filesystem, so we'll
mimics we're a filesystem that prohibits this for unpriveleged accounts.

Reported by:		zirias
Tested by:		zirias
MFC after:		1 week
2023-09-05 11:52:27 +03:00
Dmitry Chagin
18d1c86788 linux(4): Fix listxattr for the case when the size is 0
If size is specified as zero, these calls return the current size
of the list of extended attribute names (and leave list unchanged).

Tested by:		zirias
MFC after:		1 week
2023-09-05 11:51:46 +03:00
Dmitry Chagin
1bfc4574f7 linux(4): Return ENOTSUP from xattr syscalls instead of EPERM
FreeBSD does not permits manipulating extended attributes in the system
namespace by unprivileged accounts, even if account has appropriate
privileges to access filesystem object.
In Linux the system namespace is used to preserve posix acls. Some Gnu
coreutils binaries uses posix acls, eg, install, ls.  And fails if we
unexpectedly return EPERM error from xattr system calls.

In the other hands, in Linux read and write access to the system
namespace depend on the policy implemented for each filesystem, so we'll
mimics we're a filesystem that prohibits this for unpriveleged accounts.

Reported by:		zirias
Tested by:		zirias
MFC after:		1 week
2023-09-01 11:11:02 +03:00
Dmitry Chagin
dfcc0237c3 linux(4): Merge removexattr for future error recode
Tested by:		zirias
MFC after:		1 week
2023-09-01 11:10:44 +03:00
Dmitry Chagin
4d59b79055 linux(4): Return ENODATA from getxattr syscalls instead of EPERM
On Linux ENODATA mean the named attribute does not exist, or the
process has no access to this attribute.

Reported by:		zirias
Tested by:		zirias
MFC after:		1 week
2023-09-01 11:10:12 +03:00
Dmitry Chagin
6b46ec6612 linux(4): Merge getxattr for future error recode
Tested by:		zirias
MFC after:		1 week
2023-09-01 11:09:49 +03:00
Jake Freeland
af93fea710 timerfd: Move implementation from linux compat to sys/kern
Move the timerfd impelemntation from linux compat code to sys/kern. Use
it to implement the new system calls for timerfd. Add a hook to kern_tc
to allow timerfd to know when the system time has stepped. Add kqueue
support to timerfd. Adjust a few names to be less Linux centric.

RelNotes: YES
Reviewed by: markj (on irc), imp, kib (with reservations), jhb (slack)
Differential Revision: https://reviews.freebsd.org/D38459
2023-08-24 14:28:56 -06:00
Dmitry Chagin
524c9accdc linux(4): Replace linux32_copyiniov by freebsd32_copyiniov
MFC after:		1 month
2023-08-20 10:36:32 +03:00
Dmitry Chagin
c987ff4d7b linux(4): Replace linux32_copyinuio by freebsd32_copyinuio
MFC after:		1 month
2023-08-20 10:36:32 +03:00
Dmitry Chagin
d6cb9e728b linux(4): Return EAGAIN instead of ENOBUFS for non-blocking sockets in pwrite
MFC after:		1 month
2023-08-20 10:36:31 +03:00
Dmitry Chagin
dfbb3e2aae linux(4): Return EAGAIN instead of ENOBUFS for non-blocking sockets in pwritev
MFC after:		1 month
2023-08-20 10:36:31 +03:00
Dmitry Chagin
4231b825ac linux(4): Add a dedicated writev syscall wrapper
Adding a writev syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.

MFC after:		1 month
2023-08-20 10:36:31 +03:00
Dmitry Chagin
e58ff66464 linux(4): Add a write syscall wrapper
Adding a write syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.

MFC after:		1 month
2023-08-20 10:36:29 +03:00
Dmitry Chagin
a129642ced linux(4): Fix linker warning about undefined vdso symbols
Pointed out by:		bz
MFC after:		3 days
2023-08-20 00:48:36 +03:00
Dmitry Chagin
4f9fac78d4 linux(4): Return EAGAIN instead of ENOBUFS for non-blocking sockets in sendfile
MFC after:		1 month
2023-08-19 21:55:23 +03:00
Dmitry Chagin
da5a6738d5 linux(4): Allow in fd to be a socket in sendfile
In this case sendfile fallback is used.

MFC after:		1 month
2023-08-19 21:55:23 +03:00
Dmitry Chagin
110be11ac9 linux(4): Remove include of sys/types.h from linux_vdso.h
Due to sys/param.h includes sys/types.h and the fact that the sys/param.h
is included everywhere where linux_vdso.h is needed.
2023-08-18 15:58:32 +03:00
Dmitry Chagin
2be88e2cca linux(4): Follow style(9), don't include both sys/param.h and sys/types.h 2023-08-18 15:58:32 +03:00
Dmitry Chagin
3460fab5fc linux(4): Remove sys/cdefs.h inclusion where it's not needed due to 685dc743 2023-08-18 13:12:02 +03:00
Dmitry Chagin
c47116e909 linux(4): Update my copyrights, add SPDX tag 2023-08-17 23:54:36 +03:00
Dmitry Chagin
270e01d468 linux(4): Fix leftovers after 2ff63af9 2023-08-17 23:54:00 +03:00
Dmitry Chagin
6ecab39494 linux(4): Drop bogus __arm__ condition due to lack of 32-bit arm support
MFC after:		1 month
2023-08-17 22:57:17 +03:00
Dmitry Chagin
4a521544a6 linux(4): Don't miss error from underlying in sendfile
MFC after:		1 month
2023-08-17 22:57:17 +03:00
James McLaughlin
bb66c59753 linux(4): Add sendfile fallback for non-socket fds
Before Linux 2.6.33, out_fd must refer to a socket. Since Linux 2.6.33
it can be any file.
The patch was originally provided by James McLaughlin and adapted by me
for copy_file_range.

PR:			262535
Differential revision:	https://reviews.freebsd.org/D34555
MFC after:		1 month
2023-08-17 22:57:17 +03:00
Dmitry Chagin
7307c43963 linux(4): Use native off_t for fo_sendfile call
MFC after:		1 month
2023-08-17 22:57:17 +03:00
Alvin Chen
5ad5cf5079 linux(4): Be verbose about unsupported ioctl commands on ifreq ioctl
Differential revision:	https://reviews.freebsd.org/D39786
MFC after:		1 month
2023-08-17 22:57:16 +03:00
Alvin Chen
1f2b31f76e linux(4): Add 2 Linux socket ioctl commands
Support 2 Linux socket ioctl commands: SIOCGIFMETRIC, SIOCSIFMETRIC.

Differential revision:	https://reviews.freebsd.org/D39786
MFC after:		1 month
2023-08-17 22:57:16 +03:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
71625ec9ad sys: Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:24 -06:00
Warner Losh
2ff63af9b8 sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:18 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Dmitry Chagin
2467ccddc0 linux(4): Fix MSG_CTRUNC handling in recvmsg()
The MSG_CTRUNC flag of the msg_flags member of the message header is
set uppon successful completition if the control data was truncated.
Upon return from a successful call msg_controllen should contain the
length of the control message sequence.

Fixes:		0eda2cea
MFC after:	1 week
2023-08-14 15:46:12 +03:00
Dmitry Chagin
9d0c9b6d6a linux(4): Add a comment explaining udata freeing on error
MFC after:	1 week
2023-08-14 15:46:12 +03:00
Dmitry Chagin
de20eb26d0 linux(4): Refactor recvmsg
As the amount of handled anxiliary messages grows move they handlers
into a separate functions.

MFC after:	1 week
2023-08-14 15:46:12 +03:00
Dmitry Chagin
bbaa5523c0 linux(4): Skip unsupported anxiliary message
Instead of returning error, skip unsupported anxiliary messages and
fail if no one handled.

MFC after:	1 week
2023-08-14 15:46:12 +03:00