Commit graph

1360 commits

Author SHA1 Message Date
Alexander V. Chernikov
c7655e1f36 linux: add sysctl to pass untranslated interface names
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D33792

(cherry picked from commit 1f70a85b4c)
2022-03-28 08:49:23 +00:00
Alexander V. Chernikov
0b6161db7e linux: fix linux_recvmsg() MSG_PEEK flag handling
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D33790

(cherry picked from commit 96c524d8b2)
2022-03-28 08:49:13 +00:00
Edward Tomasz Napierala
f89bad7c9c linux: Fix ptrace panic with ERESTART
Translate ERESTART into Linux "internal" errno ERESTARTSYS.
This fixes the erestartsys.gen.test from strace(1).

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32623

(cherry picked from commit 6547153e46)
2022-02-21 13:36:11 +00:00
Edward Tomasz Napierala
37671ee46d linux: Make compat.linux.preserve_vstatus default to 1
From a user point of view, this makes ^T work out of the box.

Reviewed By:	debdrup (man page)
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D33842

(cherry picked from commit b896bdb86d)
2022-02-14 19:35:25 +00:00
Edward Tomasz Napierala
15acd92e8f linux: implement rt_sigsuspend(2) on arm64
... by making it architecture-independent.

Reviewed By:	dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D31259

(cherry picked from commit 72f7ddb587)
2022-02-14 18:45:24 +00:00
Edward Tomasz Napierala
f948f5ed7c linux(4): Improve comment about SA_RESTORER
No functional changes.

Sponsored By:	EPSRC

(cherry picked from commit 3eaf271d3c)
2022-02-14 00:09:13 +00:00
Edward Tomasz Napierala
b5cae0d5ed linux(4): implement PR_SET_NO_NEW_PRIVS
This makes prctl(2) support PR_SET_NO_NEW_PRIVS, by mapping it
to the native PROC_NO_NEW_PRIVS_CTL procctl(2).

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30973

(cherry picked from commit 2f514e6f13)
2022-02-14 00:08:55 +00:00
Philippe Michaud-Boudreault
b630d64c3d linux: implement statx(2)
PR:		252106
Reviewed By:	dchagin
Differential Revision:	https://reviews.freebsd.org/D30466

(cherry picked from commit 2362ad457a)
2022-02-14 00:07:33 +00:00
Edward Tomasz Napierala
3931de89e8 linux: add support for SO_PEERGROUPS
The su(8) and sudo(8) from Ubuntu Bionic use it.

Sponsored By:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28165

(cherry picked from commit cd84c82c6a)
2022-02-13 23:18:27 +00:00
shu
a95b05950a linux: remove locks around callout_drain in timerfd_close()
The lock around callout_drain() is unnecessary and may cause
deadlock when one closes a timer descriptor during timer execution.

Reviewed By:	delphij
Submitted By:	ankohuu_outlook.com (Shunchao Hu)
Differential Revision: https://reviews.freebsd.org/D28148

(cherry picked from commit 14c40d2c29)
2022-02-13 23:09:31 +00:00
Edward Tomasz Napierala
89847f7e04 linux: Improve debugging by recognizing TIOCGPTPEER
Sponsored By:	EPSRC

(cherry picked from commit 1866c766d2)
2022-02-13 22:26:01 +00:00
Edward Tomasz Napierala
ecfdca5698 linux: Also translate the signal if the code is CLD_KILLED
This fixes ./waitid.gen.test from the strace(1) test suite.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32617

(cherry picked from commit c8c93b1516)
2022-02-13 22:25:28 +00:00
Edward Tomasz Napierala
c80c9fd9f6 linux: Fix ENOTSOCK handling in sendfile(2)
The Linux way for sendfile(2) to tell the application
to fallback to another way of copying data is by EINVAL,
not ENOTSOCK.  This fixes package installation scripts
for Mono packages from Focal.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32604

(cherry picked from commit 2c7f798282)
2022-02-13 22:25:12 +00:00
Edward Tomasz Napierala
e19cb5cc3c linux: recognize TCP_INFO and ratelimit the warning
This ratelimits the "unsupported getsockopt level 6 optname 11"
warnings that happen all the time when watching Netflix.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32454

(cherry picked from commit 99f563ed76)
2022-02-13 22:24:59 +00:00
Edward Tomasz Napierala
1f86d04d02 linux: Partially implement TCSBRK
This fixes tcflush(3), unbreaking cheribuild.py under arm64 Focal.

Reviewed By:	imp
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32291

(cherry picked from commit 7e7859e7c2)
2022-02-13 22:24:49 +00:00
Edward Tomasz Napierala
6bddc31166 linux: improve FUSE support
This fixes a number of AppImages; tested with
scribus-1.5.6.1-linux-x86_64.AppImage.

Reported By:	@probonopd
Reviewed By:	asomers, emaste
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30606

(cherry picked from commit 128a1db806)
2022-02-13 22:23:41 +00:00
Edward Tomasz Napierala
cbab32bdf1 linux: deduplicate DUMMY() entries
No functional changes.

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30524

(cherry picked from commit 83043a741d)
2022-02-13 22:22:11 +00:00
Edward Tomasz Napierala
2e2337522f linux: add new syscall numbers
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30193

(cherry picked from commit 6d926e850d)
2022-02-13 21:30:49 +00:00
Edward Tomasz Napierala
1e44a4ea0d linux: support AT_EMPTY_PATH flag in fchownat(2)
This fixes rsyslog package installation scripts in Bionic.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29108

(cherry picked from commit e47823b831)
2022-02-13 21:06:38 +00:00
Edward Tomasz Napierala
8e4ef1c3a3 linux: make fstatat(2) handle AT_EMPTY_PATH
Without it, Qt5 apps from Focal fail to start, being unable to load
their plugins.  It's also necessary for glibc 2.33, as found in recent
Arch snapshots.

PR:		254112
Reviewed By:	kib
Sponsored by:	The FreeBSD Foundation, EPSRC
Differential Revision:	https://reviews.freebsd.org/D28192

(cherry picked from commit 4b45c2bb83)
2022-02-13 21:06:37 +00:00
Edward Tomasz Napierala
da51ef550c linux: implement O_PATH
Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29773

(cherry picked from commit 1663120ae4)
2022-02-13 21:06:37 +00:00
Edward Tomasz Napierala
10553107a8 linux: extend the LINUX_O_ constants to make room for O_PATH
No functional changes.

Sponsored By:	EPSRC

(cherry picked from commit 1b11173c00)
2022-02-13 21:06:37 +00:00
Edward Tomasz Napierala
187a635352 linux: adjust ordering of Linux auxv and add dummy AT_HWCAP2
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.

Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'.  This matches what
I've observed under Ubuntu Focal VM.

Reviewed By:	chuck, dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29609

(cherry picked from commit ca6e1fa3ce)
2022-02-13 21:06:37 +00:00
John Baldwin
1a9f14cfa5 Use vmspace->vm_stacktop in place of sv_usrstack in more places.
Reviewed by:	markj
Obtained from:	CheriBSD

(cherry picked from commit becaf6433b)
2022-02-16 11:55:37 -05:00
Konstantin Belousov
203bcad731 amd64: wrap 64bit sigtramp into vdso
(cherry picked from commit ab4524b3d7)
2022-01-02 18:43:01 +02:00
Konstantin Belousov
df674da44e cloudabi and linux ABIs: do not call umtx_thread_cleanup() from thr_exit syscall
(cherry picked from commit 747a6b7ace)
2021-07-22 01:11:52 +03:00
Dmitry Chagin
7a37d13b6c linux(4): Prevent integer overflow in futex_requeue.
To prevent a signed integer overflow in futex_requeue add a sanity check
to catch negative values of nrwake or nrrequeue.

(cherry picked from commit 25b09d6f39)
2021-06-29 15:58:32 -04:00
Konstantin Belousov
af3dce6141 Change the return type of sv__setid_allowed from bool to int
(cherry picked from commit 62b8258a7e)
2021-06-13 04:22:33 +03:00
Konstantin Belousov
dc107fe1f9 linuxolator: Add compat.linux.setid_allowed knob
PR:	21463

(cherry picked from commit 598f6fb49c)
2021-06-13 04:22:33 +03:00
Dmitry Chagin
e06cbb2b7a linux_common: retire extra module version.
The second 'linuxcommon' line was added by c66f5b079d
but Linuxulator's modules dependend on 'linux_common'.
To avoid such mistakes in the future rename moduledata name and module
name to  'linux_common' and retire 'linuxcommon' line.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D30409

(cherry picked from commit 5184e2da41)
2021-06-10 12:39:38 +03:00
Vladimir Kondratyev
fd140c5216 epoll: Store epoll_event udata member in ext member of kevent.
Current epoll implementation stores udata fields of epoll_event
structure in special dynamically-sized table rather than in udata field
of backing kevent structure because of 2 reasons:
1. Kevent's udata size is smaller than epoll's on 32-bit archs.
2. Kevent's udata can be clobbered on execution EPOLL_CTL_ADD as kqueue
   modifies existing event while epoll returns error in this case.

After r320043 has introduced four new 64bit user data members (ext[]),
we can store epoll udata in one of them and drop aforementioned table.
According to kqueue_register() source code ext members are not updated
when existing kevent is modified that fixes p.2.

As a side effect the patch fixes PR/252582.

Reviewed by:	trasz
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D28169

(cherry picked from commit b3c6fe663b)
2021-04-12 00:47:39 +03:00
John Baldwin
f9d590884a Rename linux_set_upcall_kse() to linux_set_upcall().
This matches the rename of cpu_set_upcall_kse() in
5c2cf81845.

Sponsored by:	DARPA

(cherry picked from commit 3b57ddb029)
2021-03-29 15:36:11 -07:00
shu
0cb6fa6acc linux: make timerfd_settime(2) set expirations count to zero
On Linux, read(2) from a timerfd file descriptor returns an unsigned
8-byte integer (uint64_t) containing the number of expirations
that have occurred, if the timer has already expired one or more
times since its settings were last modified using timerfd_settime(),
or since the last successful read(2).  That's to say, once we do
a read or call timerfd_settime(), timer fd's expiration count should
be zero.  Some Linux applications create timerfd and add it to epoll
with LT mode, when event comes, they do timerfd_settime instead
of read to stop event source from trigger.  On FreeBSD,
timerfd_settime(2) didn't set the count to zero, which caused high
CPU utilization.

Submitted by:	ankohuu_outlook.com (Shunchao Hu)
Differential Revision: https://reviews.freebsd.org/D28231

(cherry picked from commit ae71b794cb)
2021-03-19 21:36:11 +03:00
Edward Tomasz Napierala
ab1a91d958 linux(4): make getcwd(2) return ERANGE instead of ENOMEM
For native FreeBSD binaries, the return value from __getcwd(2)
doesn't really matter, as the libc wrapper takes over and returns
the proper errno.

PR:		kern/254120
Reported By:	Alex S <iwtcex@gmail.com>
Reviewed By:	kib
Sponsored By:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29217

(cherry picked from commit 0dfbdd9fc2)
2021-03-15 13:00:17 +00:00
Edward Tomasz Napierala
47d6ee406e linux: add support for SO_PEERSEC getsockopt
It returns "unconfined", like Linux without SELinux would.

Sponsored By:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28164

(cherry picked from commit e44a78ce6f)
2021-03-02 18:43:27 +00:00
Edward Tomasz Napierala
9d930fb090 linux: fix handling of flags for 32 bit send(2) syscall
Previously the flags were passed as-is, which could resulted
in spurious EAGAIN returned for non-blocking sockets, which
broke some Steam games.

PR:		248065
Reported By:	Alex S <iwtcex@gmail.com>
Tested By:	Alex S <iwtcex@gmail.com>
Reviewed By:	emaste
MFC After:	3 days
Sponsored By:	The FreeBSD Foundation

(cherry picked from commit f6e8256a96)
2021-03-02 18:43:27 +00:00
Mark Johnston
3df2766a69 linux: Unmap the VDSO page when unloading
linux_shared_page_init() creates an object and grabs and maps a single
page to back the VDSO.  When destroying the VDSO object, we failed to
destroy the mapping and free KVA.  Fix this.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28696

(cherry picked from commit 0fc8a79672)
2021-02-22 20:29:55 -05:00
Edward Tomasz Napierala
7d3310c4fc linux: remove spurious newline.
Sponsored by:	The FreeBSD Foundation
2021-01-19 09:56:45 +00:00
Edward Tomasz Napierala
feb96ee9c8 linux: mute "unsupported socket(AF_NETLINK, 3, NETLINK_AUDIT)" warnings
They are way too noisy with Focal.

Sponsored by:	The FreeBSD Foundation
2021-01-14 09:16:28 +00:00
Edward Tomasz Napierala
ec2700e015 linux: mute the "unsupported prctl option 23" warnings
Make the PR_CAPBSET_READ prctl(2) return EINVAL without logging
any warnings; this is way too noisy with Focal.

Sponsored by:	The FreeBSD Foundation
2021-01-13 10:31:56 +00:00
Edward Tomasz Napierala
a339b4223a linux: bump the default version from 3.10.0 to 3.17.0
This is required for Qt5, as found in Ubuntu Focal.  The library contains
the minimum kernel version encoded in an ELF note; this makes rtld ignore
it altogether, with a confusing error message.  Without it, things fail
like this:

$ konsole: error while loading shared libraries: libQt5Core.so.5: cannot
open shared object file: No such file or directory

For reference, the Qt kernel version requirements can be found at:
https://github.com/qt/qtbase/blob/dev/src/corelib/global/minimum-linux_p.h

Sponsored by:	The FreeBSD Foundation
Reviewed By:	emaste
Differential Revision:	https://reviews.freebsd.org/D28105
2021-01-13 10:02:16 +00:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Konstantin Belousov
7a202823aa Expose eventfd in the native API/ABI using a new __specialfd syscall
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues.  Unfortunately, kqueues
are not passable between processes.  And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow.  This came up when debugging strange HW video decode
failures in Firefox.  A native implementation would avoid these problems
and help with porting Linux software.

Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.

Submitted by:   greg@unrelenting.technology
Reviewed by:    markj (previous version)
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D26668
2020-12-27 12:57:26 +02:00
Konstantin Belousov
7cb901bf22 Remove useless ARGUSED annotations.
Submitted by:	greg@unrelenting.technology
2020-12-27 12:57:26 +02:00
Konstantin Belousov
11c9f2ff1a Add SPDX tag.
Submitted by:	greg@unrelenting.technology
2020-12-27 12:57:26 +02:00
Tijl Coosemans
77fb6b6644 Move V4L feature declarations and DTrace provider definitions from
linux_common.c to linux_util.c so they become available on i386.

linux_common.c defines the linux_common kernel module but this module does
not exist on i386 and linux_common.c is not included in the linux module.
linux_util.c is included in the linux_common module on amd64 and the linux
module on i386.

Remove linux_common.c from files.i386 again.  It was added recently in
r367433 when the DTrace provider definitions were moved.

The V4L feature declarations were moved to linux_common in r283423.
2020-12-06 10:58:55 +00:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Konstantin Belousov
4815f175d0 Linuxolator: Replace use of eventhandlers by sysent hooks.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27309
2020-11-23 18:18:16 +00:00
Conrad Meyer
f8f74aaa84 linux(4) clone(2): Correctly handle CLONE_FS and CLONE_FILES
The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1().  This change depends on the pwddesc split
introduced in r367777.

I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag).  This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.

In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.

The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).

Discussed with:	markj, trasz (earlier version)
Differential Revision:	https://reviews.freebsd.org/D27016
2020-11-17 21:20:11 +00:00
Conrad Meyer
ede4af47ae unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI
As this ABI is still fresh (r367287), let's correct some mistakes now:

- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess

Discussed with:	kib, markj, trasz
Differential Revision:	https://reviews.freebsd.org/D27084
2020-11-17 20:01:21 +00:00