Commit graph

1717 commits

Author SHA1 Message Date
Ed Maste
3cf834d069 linuxulator: ignore AT_NO_AUTOMOUNT for all stat variants
Commit ff39d74aa9 ignored AT_NO_AUTOMOUNT for statx(), but did not
change fstat64() or newfstatat(), which also take an equivalent flags
argument.  Add a linux_to_bsd_stat_flags() helper and use it in all
three places.

PR:		281526
Reviewed by:	trasz
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46711
2024-09-24 13:58:42 -04:00
Fernando Apesteguía
5ab6ed93cd faccessat(2): Honor AT_SYMLINK_NOFOLLOW
Make the system call honor `AT_SYMLINK_NOFOLLOW`.

Also enable this from `linux_faccessat2` where the issue arised the first time.
Update manual pages accordingly.

PR:			275295
Reported by:		kenrap@kennethraplee.com
Approved by:		kib@
Differential Revision:	https://reviews.freebsd.org/D46267
2024-08-11 17:49:06 +02:00
Chuck Tuffli
ad9cc86bf6 linux: Translate Linux NVME ioctls to the lower layers.
The lower layers implement a ABI compatible Linux ioctl for a few of the
Linux IOCTLs. Translate them and pass them down. Since they are ABI
compatible, just use the nvme ioctl name.

Co-Authored-by: Warner Losh <imp@bsdimp.com>
Reviewed by:	chuck
Differential Revision:	https://reviews.freebsd.org/D45416
2024-06-14 16:40:20 -06:00
Andrew Turner
ec69d23093 linux: Allows writing to the vdso from the kernel
We need to write to the vdso in the kernel to perform fixups. Move it
from .rodata to .data so these can be run.

Reported by:	cy
Sponsored by:	Arm Ltd
2024-06-06 09:07:49 +00:00
Andrew Turner
bed65d85c6 linux64: Fix the build on arm64 with bti checking
When we enable checking for BTI on arm64 we need to include an ELF
note in all object files linked into a module.

As using objcopy from a binary to an ELF object file doesn't add the
note switch to using .incbin from an assembly file. This allows us to
add the needed note without affecting the included object.

Reviewed by:	imp, kib, emaste
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D45468
2024-06-05 09:23:40 +00:00
Val Packett
108de78451 Redefine CLOCK_BOOTTIME to alias CLOCK_MONOTONIC, not CLOCK_UPTIME
The suspend-awareness situation with monotonic clocks across platforms
is kind of a mess, let's try not making it worse.

On Linux, CLOCK_MONOTONIC does NOT count suspended time, and
CLOCK_BOOTTIME was introduced to INCLUDE suspended time.

On OpenBSD, CLOCK_MONOTONIC DOES count suspended time, and CLOCK_UPTIME
was introduced to EXCLUDE suspended time.

On macOS, it's the same as OpenBSD, but with CLOCK_UPTIME_RAW.

Right now, we do not have a monotonic clock that counts suspended time.
We have CLOCK_UPTIME as a distinct ID alias, and CLOCK_BOOTTIME as a
preprocessor alias, both being effectively `CLOCK_MONOTONIC` for now.

When we introduce a suspend-aware clock in the future, it would make a
lot more sense to do it the OpenBSD/macOS way, i.e. to make
CLOCK_MONOTONIC include suspended time and make CLOCK_UPTIME exclude it,
because that's what the name CLOCK_UPTIME implies: a deviation from the
default intended for the uptime command to allow it to only show the
time the system was actually up and not suspended.

Let's change the define right now to make sure software using the define
would not end up using the ID of the wrong clock in the future, and fix
the IDs in the Linux compat code to match the expected changes too.

See https://bugzilla.mozilla.org/show_bug.cgi?id=1824084
for more discussion.

Fixes:		155f15118a ("clock_gettime: Add Linux aliases for CLOCK_*")
Fixes:		25ada63736 ("Map Linux CLOCK_BOOTTIME to native CLOCK_UPTIME.")
Sponsored by:	https://www.patreon.com/valpackett
Reviewed by:	kib, imp
Differential Revision:	https://reviews.freebsd.org/D39270
2024-05-31 08:45:02 -06:00
Warner Losh
c48820a448 minor style tweak.
checkstyle9 doesn't check for this construct...

Fixes: 6d849754b9
2024-05-29 09:53:20 -06:00
Son Phan Trung
6d849754b9 linux: implement PR_CHILD_SET_SUBREAPER
Reviewed by: imp, dchagin
Pull Request: https://github.com/freebsd/freebsd-src/pull/1260
2024-05-29 07:56:23 -06:00
Gleb Smirnoff
2780e5f43d linux: allow RTM_GETADDR without full ifaddrmsg argument
Even modern glibc uses truncated argument for RTM_GETADDR when it wants to
list all addresses in a system.  See
sysdeps/unix/sysv/linux/ifaddrs.c:__netlink_sendreq().  It sends a one
char payload.  Linux kernel allows that as long as given socket is not
marked as a 'strict'.  We have a similar flag in the general netlink code
and it is checked in
sys/netlink/netlink_message_parser.h:nl_parse_header().  If the flag is
not present, parser will allocate a temporary zeroed buffer to make the
message correct.  The checks added in b977dd1ea5 blocked such message
before the parser.  My reading of glibc says that there are two types of
messages that are sent with __netlink_sendreq() - RTM_GETLINK and
RTM_GETADDR.  The RTM_GETLINK is binary compatible between Linux and
FreeBSD and thus doesn't need any ABI handler.

PR:		279012
Fixes:		b977dd1ea5
2024-05-28 13:13:08 -07:00
Ricardo Branco
9a9677ec1c linux: Update linux manpage to mention mqueuefs
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:47 -06:00
Ricardo Branco
97add684f5 linux: Support POSIX message queues
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:46 -06:00
Ricardo Branco
86e43b5def linux: Export linux_convert_l_sigevent
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:46 -06:00
Ricardo Branco
9c7b1bf271 linux: Export linux_common_openflags()
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:46 -06:00
Zhenlei Huang
68c890b443 linux(4): Add const qualifier to the value parameter of function handle_string()
The content that `value` point to is not going to be altered by function
handle_string().

MFC after:	1 week
2024-05-20 12:02:33 +08:00
Ricardo Branco
a7cc56b28f linux: Adjust rlimit SIGPENDING & MSGQUEUE behaviour to match linprocfs
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1227
2024-05-10 14:50:04 -06:00
Andrew Gallatin
13a5a46c49 Fix new users of MAXPHYS and hide it from the kernel namespace
In cd85379104, kib made maxphys a load-time tunable.  This made
the #define MAXPHYS in sys/param.h  almost entirely obsolete, as
it could now be overridden by kern.maxphys at boot time, or by
opt_maxphys.h.

However, decades of tradition have led to several new, incorrect, uses
of MAXPHYS in other parts of the kernel, mostly by seasoned
developers.  I've corrected those uses here in a mechanical fashion,
and verified that it fixes a bug in the md driver that I was
experiencing.

Since using MAXPHYS is such an easy mistake to make, it is best to
hide it from the kernel namespace.  So I've moved its definition to
_maxphys.h, which is now included in param.h only for userspace.

That brings up the fact that lots of userspace programs use MAXPHYS
for different reasons, most of them probably wrong.  Userspace consumers
that really need to know the value of maxphys should probably be
changed to use the kern.maxphys sysctl.  But that's outside the scope
of this change.

Reviewed by: imp, jkim, kib, markj
Fixes: 30038a8b4e ("md: Get rid of the pbuf zone")
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44986
2024-04-30 15:29:06 -04:00
Lexi Winter
ca63710d36 linux: ignore setsockopt(IPV6_RECVERR)
Under Linux, the socket options IP_RECVERR and IPV6_RECVERR are used to
receive socket errors via a dedicated 'error queue' which can be
retrieved via recvmsg().  FreeBSD does not support this functionality.

For IPv4, the sysctl compat.linux.ignore_ip_recverr can be set to 1 to
silently ignore attempts to set IP_RECVERR and return success to the
application, which is wrong, but is required for (among other things)
a functional DNS client in recent versions of glibc.

Add support for ignoring IPV6_RECVERR, controlled by the same sysctl.
This fixes DNS in Linux when using IPv6 resolvers.

Reviewed by: imp, Jose Luis Duran
Pull Request: https://github.com/freebsd/freebsd-src/pull/1118
2024-04-22 22:36:34 -06:00
Brooks Davis
6bb132ba1e Reduce reliance on sys/sysproto.h pollution
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.

sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44465
2024-04-15 21:35:40 +01:00
Gleb Smirnoff
b977dd1ea5 linux: make linux_netlink_p->msg_from_linux be able to fail
The KPI for this function was misleading.  From the NetLink perspective it
looked like a function that: a) allocates new hdr, b) can fail.  Neither
was true.  Let the function return a error code instead of returning the
same hdr it was passed to.  In case if future Linux NetLink compatibility
support calls for reallocating header, pass hdr as pointer to pointer.

With KPI that returns a error, propagate domain conversion errors all the
way up to NetLink module.  This fixes panic when unknown domain is
converted to 0xff and this invalid value is passed into NetLink
processing.

PR:			274536
Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D44392
2024-03-29 13:35:51 -07:00
Gleb Smirnoff
9d4a08d162 linux: use sa_family_t for address family conversions
Express "conversion failed" with maximum possible value.  This allows to
reduce number of size/signedness conversion in the code that utilizes the
functions.

PR:			274536
Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D44375
2024-03-29 13:35:37 -07:00
Gleb Smirnoff
2f5a315b30 linux: require vnet(9) context in ifname_bsd_to_linux_name()
This function is used by netlink(9) only.  The netlink(9) taskqueue thread
runs in the vnet of the socket whose request the thread is processing
right now.  This is a correct vnet and resetting it to vnet0 is incorrect.
If the function is to be used by any other caller in addition to
netlink(9), it would be caller's responsiblity to provide correct vnet(9).

Reviewed by:		melifaro, dchagin
Differential Revision:	https://reviews.freebsd.org/D44191
PR:			277286
2024-03-03 12:56:58 -08:00
Gleb Smirnoff
41ce9c8b88 netlink: restore original buffer if nlmsgs_to_linux() fails
Caller is responsible to free it or reuse.

Fixes:	17083b94a9
2024-02-27 12:45:54 -08:00
Konstantin Belousov
99fa799a19 linux_pwd_onexec: do not abort image activation if emul path does not exist
Instead clear the altroot, if any.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D43833
2024-02-22 03:31:39 +02:00
Alfredo Mazzinghi
61cc4830a7 Abstract UIO allocation and deallocation.
Introduce the allocuio() and freeuio() functions to allocate and
deallocate struct uio. This hides the actual allocator interface, so it
is easier to modify the sub-allocation layout of struct uio and the
corresponding iovec array.

Obtained from:	CheriBSD
Reviewed by:	kib, markj
MFC after:	2 weeks
Sponsored by:	CHaOS, EPSRC grant EP/V000292/1
Differential Revision:	https://reviews.freebsd.org/D43711
2024-02-10 11:38:04 -05:00
rilysh
3221c44d80 sys/compat/linux/linux_misc.c: remove an extra semicolon
Signed-off-by: rilysh <nightquick@proton.me>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/959
2024-02-02 18:35:01 -07:00
Gleb Smirnoff
17083b94a9 netlink: use protocol specific receive buffer
Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's,
same part of struct sockbuf that is used for send buffer already.
This shaves a lot of code and a lot of extra processing.  The pcb rids
of the I/O queues as the socket buffer is exactly the queue.  The
message writer is simplified a lot, as we now always deal with linear
buf.  Notion of different buffer types goes away as way as different
kinds of writers.  The only things remaining are: a socket writer and
a group writer.
The impact on the network stack is that we no longer use mbufs, so
a workaround from d187154750 disappears.

Note on message throttling.  Now the taskqueue throttling mechanism
needs to look at both socket buffers protected by their respective
locks and on flags in the pcb that are protected by the pcb lock.
There is definitely some room for optimization, but this changes tries
to preserve as much as possible.

Note on new nl_soreceive().  It emulates soreceive_generic().  It
must undergo further optimization, see large comment put in there.

Note on tests/sys/netlink/test_netlink_message_writer.py. This test
boiled down almost to nothing with mbufs removed.  However, I left
it with minimal functionality (it basically checks that allocating N
bytes we get N bytes) as it is one of not so many examples of ktest
framework that allows to test KPIs with python.

Note on Linux support. It got much simplier: Netlink message writer
loses notion of Linux support lifetime, it is same regardless of
process ABI.  On socket write from Linux process we perform
conversion immediately in nl_receive_message() and on an output
conversion to Linux happens in in nl_send_one(). XXX: both
conversions use M_NOWAIT allocation, which used to be the case
before this change, too.

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D42524
2024-01-02 13:04:01 -08:00
Mark Johnston
b9924c202f linux: Check for copyout errors in ioctl handlers
In preparation for annotating copyin() and friends with
__result_use_check.

Reviewed by:	dchagin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D43199
2023-12-27 10:13:15 -05:00
Gleb Smirnoff
f27aff8f7f linux/netlink: don't override sopt level
This override effectively prevents correct entering of netlink
protocol specific pr_ctloutput in sosetopt().

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D42520
2023-12-26 20:21:58 -08:00
Gleb Smirnoff
0fac350c54 sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912, use same approach
for two simplier syscalls that return socket addresses.  Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.

Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D42694
2023-11-30 08:31:10 -08:00
Gleb Smirnoff
cfb1e92912 sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.

While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient.  Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do.  Linux also does that. Update tests accordingly.

Reviewed by:		rscheff, tuexen, zlei, dchagin
Differential Revision:	https://reviews.freebsd.org/D42635
2023-11-30 08:30:55 -08:00
Kristof Provost
ab393e9548 netlink: move NETLINK define to opt_global.h
Move the NETLINK define into opt_global.h so we can rely on it being
set correctly, without having to remember to include opt_netlink.h.
This ensures that the NETLINK define is correctly set. If not we
may end up with unloadable modules, due to missing symbols (such as
nlmsg_get_group_writer).

PR:		274306
Reviewed by:	imp, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D42179
2023-10-13 09:23:47 +02:00
Dmitry Chagin
1ee29160c5 linux(4): Fix semid64_ds structure layout
Unlike x86_64, other 64-bit architectures do not have paddings
for the time fields.

MFC after:		1 week
2023-10-04 21:21:12 +03:00
Dmitry Chagin
fbb3f13b15 linux(4): Actually delete linux_sysproto.h
MFC after:		1 week
2023-10-03 11:26:17 +03:00
Dmitry Chagin
199e397e9b linux(4): Deorbit linux_nosys
Differential Revision:	https://reviews.freebsd.org/D41901
MFC after:		1 week
2023-10-03 10:38:03 +03:00
Dmitry Chagin
0a16d3d14d linux(4): Update syscalls.master to 6.5
MFC after:		1 week
2023-09-25 12:24:58 +03:00
Dmitry Chagin
227d01c1bc linux(4): On Linux SIGKILL can not be reset to default
MFC after:		1 week
2023-09-18 17:53:01 +03:00
Dmitry Chagin
794328fbc1 linux(4): Staticize lsiginfo_to_siginfo
It's not used outside of linux_signal.c
While here fix the indentation.

MFC after:		1 week
2023-09-18 17:52:43 +03:00
Dmitry Chagin
2a1cf1b6b5 linux(4): Deduplicate mmap2
To help porting the Linux emulation layer to a new platforms start using
Linux names for conditional builds instead of architecture-specific ifdefs.

MFC after:		1 week
2023-09-05 21:16:39 +03:00
Dmitry Chagin
553b1a4e4e linux(4): Deduplicate mprotect, madvise
MFC after:		1 week
2023-09-05 21:15:52 +03:00
Vico Chen
aadc14bceb linux(4): Convert flags in timerfd_create
The timerfd is introduced in FreeBSD 14, and the Linux ABI timerfd is
also moved to FreeBSD native timerfd, but it can't work well as Linux
TFD_CLOEXEC and TFD_NONBLOCK haven't been converted to FreeBSD
TFD_CLOEXEC and TFD_NONBLOCK.

Reviewed by:		dchagin, jfree
Differential revision:	https://reviews.freebsd.org/D41708
MFC after:		1 week
2023-09-05 11:53:02 +03:00
Dmitry Chagin
11e37048db linux(4): Return ENOTSUP from listxattr instead of EPERM
FreeBSD does not permits manipulating extended attributes in the system
namespace by unprivileged accounts, even if account has appropriate
privileges to access filesystem object.
In Linux the system namespace is used to preserve posix acls. Some Gnu
coreutils binaries uses posix acls, eg, install, ls, cp.  And fails if
we unexpectedly return EPERM error from xattr system calls.

In the other hands, in Linux read and write access to the system
namespace depend on the policy implemented for each filesystem, so we'll
mimics we're a filesystem that prohibits this for unpriveleged accounts.

Reported by:		zirias
Tested by:		zirias
MFC after:		1 week
2023-09-05 11:52:27 +03:00
Dmitry Chagin
18d1c86788 linux(4): Fix listxattr for the case when the size is 0
If size is specified as zero, these calls return the current size
of the list of extended attribute names (and leave list unchanged).

Tested by:		zirias
MFC after:		1 week
2023-09-05 11:51:46 +03:00
Dmitry Chagin
1bfc4574f7 linux(4): Return ENOTSUP from xattr syscalls instead of EPERM
FreeBSD does not permits manipulating extended attributes in the system
namespace by unprivileged accounts, even if account has appropriate
privileges to access filesystem object.
In Linux the system namespace is used to preserve posix acls. Some Gnu
coreutils binaries uses posix acls, eg, install, ls.  And fails if we
unexpectedly return EPERM error from xattr system calls.

In the other hands, in Linux read and write access to the system
namespace depend on the policy implemented for each filesystem, so we'll
mimics we're a filesystem that prohibits this for unpriveleged accounts.

Reported by:		zirias
Tested by:		zirias
MFC after:		1 week
2023-09-01 11:11:02 +03:00
Dmitry Chagin
dfcc0237c3 linux(4): Merge removexattr for future error recode
Tested by:		zirias
MFC after:		1 week
2023-09-01 11:10:44 +03:00
Dmitry Chagin
4d59b79055 linux(4): Return ENODATA from getxattr syscalls instead of EPERM
On Linux ENODATA mean the named attribute does not exist, or the
process has no access to this attribute.

Reported by:		zirias
Tested by:		zirias
MFC after:		1 week
2023-09-01 11:10:12 +03:00
Dmitry Chagin
6b46ec6612 linux(4): Merge getxattr for future error recode
Tested by:		zirias
MFC after:		1 week
2023-09-01 11:09:49 +03:00
Jake Freeland
af93fea710 timerfd: Move implementation from linux compat to sys/kern
Move the timerfd impelemntation from linux compat code to sys/kern. Use
it to implement the new system calls for timerfd. Add a hook to kern_tc
to allow timerfd to know when the system time has stepped. Add kqueue
support to timerfd. Adjust a few names to be less Linux centric.

RelNotes: YES
Reviewed by: markj (on irc), imp, kib (with reservations), jhb (slack)
Differential Revision: https://reviews.freebsd.org/D38459
2023-08-24 14:28:56 -06:00
Dmitry Chagin
524c9accdc linux(4): Replace linux32_copyiniov by freebsd32_copyiniov
MFC after:		1 month
2023-08-20 10:36:32 +03:00
Dmitry Chagin
c987ff4d7b linux(4): Replace linux32_copyinuio by freebsd32_copyinuio
MFC after:		1 month
2023-08-20 10:36:32 +03:00
Dmitry Chagin
d6cb9e728b linux(4): Return EAGAIN instead of ENOBUFS for non-blocking sockets in pwrite
MFC after:		1 month
2023-08-20 10:36:31 +03:00