Commit graph

1314 commits

Author SHA1 Message Date
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Konstantin Belousov
4815f175d0 Linuxolator: Replace use of eventhandlers by sysent hooks.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27309
2020-11-23 18:18:16 +00:00
Conrad Meyer
f8f74aaa84 linux(4) clone(2): Correctly handle CLONE_FS and CLONE_FILES
The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1().  This change depends on the pwddesc split
introduced in r367777.

I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag).  This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.

In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.

The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).

Discussed with:	markj, trasz (earlier version)
Differential Revision:	https://reviews.freebsd.org/D27016
2020-11-17 21:20:11 +00:00
Conrad Meyer
ede4af47ae unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI
As this ABI is still fresh (r367287), let's correct some mistakes now:

- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess

Discussed with:	kib, markj, trasz
Differential Revision:	https://reviews.freebsd.org/D27084
2020-11-17 20:01:21 +00:00
Conrad Meyer
de774e422e linux(4): Implement name_to_handle_at(), open_by_handle_at()
They are similar to our getfhat(2) and fhopen(2) syscalls.

Differential Revision:	https://reviews.freebsd.org/D27111
2020-11-17 19:51:47 +00:00
Edward Tomasz Napierala
e3b1c847a4 Make it possible to mount a fuse filesystem, such as squashfuse,
from a Linux binary.  Should come handy for AppImages.

Reviewed by:	asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26959
2020-11-09 08:53:15 +00:00
Alexander Leidinger
8ec6c4a38b - add more linux socket options (sorted by value)
- map those IPv4 / IPv6 socket options which exist in FreeBSD
   + most of them visually verified to have the same type/layout of arguments
   + not tested with linux programs to behave as intended
 - be more human readable for known options which are not handled
 - be more verbose for unhandled socket message flags we know about
 - print the jail ID in linux_msg if run in a jail
 - add possibility to print debug message about known missing parts only once
 - add multiple levels of sysctl linux.debug:
   1: print debug messages, tell about unimplemented stuff (only once)
   2: like 1, but also print messages about implemented but not tested
      stuff (only once)
   3+: like 2, but no rate limiting of messages
 - increase default linux debug level from 1 to 3

We are a lot more verbose in as we need to be (e.g. some of the IP socket
options which are the same, and share the same memory layout, and are
believed to work). The reason is that we have no good testsuite to test those
linux-bits. The LTP or other test suites like the python one, are not fully
up to the task we need. As such the excessive messages about emulated but not
tested socket options.

IMO any MFC (possible, but most probably not by me) should set the default
debug level to 1.

Discussed with:	trasz
2020-11-08 09:50:58 +00:00
Conrad Meyer
76b2bfeda4 linux(4): Fix loadable modules after r367395
Move dtrace SDT definitions into linux_common module code.  Also, build
linux_dummy.c into the linux_common kld -- we don't need separate
versions of these stubs for 32- and 64-bit emulation.

Reported by:	several
PR:		250897
Discussed with:	emaste, trasz
Tested by:	John Kennedy, Yasuhiro KIMURA, Oleg Sidorkin
X-MFC-With:	r367395
Differential Revision:	https://reviews.freebsd.org/D27124
2020-11-06 22:04:57 +00:00
Conrad Meyer
e9b13c6612 linux(4): Deduplicate unimpl/dummy syscall handlers
No functional change.

Reviewed by:	emaste, trasz
Differential Revision:	https://reviews.freebsd.org/D27099
2020-11-05 19:30:31 +00:00
Edward Tomasz Napierala
cdf6e4e922 Unbreak buildworld after r367339.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-11-04 21:39:04 +00:00
Edward Tomasz Napierala
2f927d87f9 Add linux_to_bsd_errtbl[], mapping Linux errnos to their BSD counterparts.
This will be used by fuse(4).

Reviewed by:	asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26974
2020-11-04 19:54:18 +00:00
Conrad Meyer
9e47480e94 linux(4): Improve netlink diagnostics
Add some missing netlink_family definitions and produce vaguely
human-readable error messages for those definitions, like we used to do for
just ROUTE and KOBJECT_UEVENTS.

Additionally, if we know it's a netfilter socket but didn't find it in the
table, fall back to printing that instead of the generic handler ("socket
domain 16, ...").

No change to the emulator correctness, just mildly improved diagnostics for
gaps.
2020-11-03 19:50:42 +00:00
Edward Tomasz Napierala
7abf30d339 Make linux_errtbl[] static.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27004
2020-11-03 19:12:33 +00:00
Edward Tomasz Napierala
939e5de8d4 Fix rookie mistake - it's nitems(), not sizeof().
Reported by:	xtouqh_icloud.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-11-03 14:44:33 +00:00
Conrad Meyer
eaa5afcefa linux(4) prctl(2): Implement PR_[GS]ET_DUMPABLE
Proxy the flag to the roughly analogous FreeBSD procctl 'TRACE'.

TRACE-disabled processes are not coredumped, and Linux !DUMPABLE processes
can not be ptraced.  There are some additional semantics around ownership of
files in the /proc/[pid] pseudo-filesystem, which we do not attempt to
emulate correctly at this time.

Reviewed by:	markj (earlier version)
Differential Revision:	https://reviews.freebsd.org/D27015
2020-11-03 02:10:54 +00:00
Conrad Meyer
443d8a07df linux(4): Emulate Linux SOL_SOCKET:SO_PASSCRED
This is required by some major linux applications, such as Chrome and
Firefox.  (As well as Electron-using applications, which are essentially
a bundled version of Chrome.)

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27012
2020-11-03 01:19:13 +00:00
Conrad Meyer
a98f03786e linux(4): style: Eliminate dead 'break' after 'return'
No functional change.
2020-11-03 01:10:27 +00:00
Conrad Meyer
7731194090 linux(4): Quiesce unrecognized ioctl warning for F2FS query
On Linux, sqlite probes for underlying F2FS filesystems that support
certain kinds of atomic update with this ioctl.  The expected result on
non-F2FS filesystem (i.e., all FreeBSD filesystems) is any error value.

Minimally implement the ioctl and avoid the warning message.

(This shows up in Linux Chrome, which embeds sqlite.)

Reviewed by:	emaste, trasz
Differential Revision:	https://reviews.freebsd.org/D27050
2020-11-02 18:45:43 +00:00
Conrad Meyer
53efdb55a8 linux(4): Deduplicate ioctl range construction with a helper macro
No functional change.

Reviewed by:	emaste, trasz
Differential Revision:	https://reviews.freebsd.org/D27049
2020-11-02 18:45:15 +00:00
Conrad Meyer
63ed2e3642 linux(4): Disambiguate identical ioctl errors in distinct paths
And stop truncating the full ioctl number in the error message.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D27048
2020-11-02 06:16:11 +00:00
Conrad Meyer
76dfd556f1 linux(4): Add missing clone(2) flags 2020-10-31 01:12:35 +00:00
Conrad Meyer
ae9cafd919 linux(4): Quiesce warning about madvise(..., -1)
This API misuse is intended to produce an error value to detect certain
bogus stub implementations of MADV_WIPEONFORK.  We don't need to log a
warning about it.

Example:
https://boringssl.googlesource.com/boringssl/+/ad5582985cc6b89d0e7caf0d9cc7e301de61cf66%5E%21/

Reviewed by:	emaste, trasz
Differential Revision:	https://reviews.freebsd.org/D27017
2020-10-30 19:02:59 +00:00
Edward Tomasz Napierala
b60b81e643 Fix typo.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-10-29 14:42:51 +00:00
Edward Tomasz Napierala
1a8577fa68 Add defines for Linux errno values and use them to make linux_errtbl[]
more readable.  While here, add linux_check_errtbl() function to make
sure we don't leave holes.

No objections:	emaste (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26972
2020-10-29 14:23:52 +00:00
Edward Tomasz Napierala
1701c69b6e Make linux_errtbl a bit more readable by using named initializers.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26970
2020-10-28 14:16:08 +00:00
Edward Tomasz Napierala
866b1f5147 Fix misnomer - linux_to_bsd_errno() does the exact opposite.
Reported by:	arichardson
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26965
2020-10-27 12:49:40 +00:00
Mateusz Guzik
fe76bef462 linux: silence renameat2 flags warning
Hogs the console while building the Linux kernel in a Ubuntu Focal jail.
2020-10-26 18:03:50 +00:00
Mateusz Guzik
1024de70f9 linux: add missing conversions for compat.linux.use_emul_path handling 2020-10-26 18:02:52 +00:00
Edward Tomasz Napierala
b3be0b4d0c Tweak linux(4) socket(2) debug messages.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26900
2020-10-24 14:25:38 +00:00
Edward Tomasz Napierala
62b1382ff3 Further improve prctl(2) debug.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26916
2020-10-24 14:23:44 +00:00
Edward Tomasz Napierala
1c7481377c Improve prctl(2) debug.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26899
2020-10-23 12:00:30 +00:00
Edward Tomasz Napierala
f4d91df5a0 Make linux(4) warn about unsupported socket(2) types.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25680
2020-10-21 18:45:48 +00:00
Edward Tomasz Napierala
1a34e9fad6 Fix potential race condition in linux stat(2).
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25618
2020-10-20 17:19:10 +00:00
Edward Tomasz Napierala
54669eb779 Add compat.linux.dummy_rlimits, and disable by default.
Turns out the dummy rlimits fix prlimit(1), but break su(8)
(login-1:4.5-1ubuntu2) - although not sudo(8), for some reason.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26814
2020-10-18 15:58:16 +00:00
Edward Tomasz Napierala
1c34dcb532 Set default stack size for Linux apps to 8MB. This matches Linux'
defaults, makes core files smaller, and fixes applications which use
pthread_join(3) in a wrong way, namely Steam.

This is based on a patch submitted by Jason Yang, which I've reworked
to set the limit instead of only changing the value reported (which
is enough to fix the bug for Linux pthreads, but could be confusing).

PR:		248225
Submitted by:	Jason_YH_Yang at wistron.com (earlier version)
Analyzed by:	Alex S <iwtcex@gmail.com>
Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26778
2020-10-16 11:23:30 +00:00
Edward Tomasz Napierala
139c09788b Make linux getrlimit(2) and prlimit(2) return something reasonable
for linux-specific limits.  Fixes prlimit (util-linux-2.31.1-0.4ubuntu3.7).

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26777
2020-10-16 10:10:09 +00:00
Konstantin Belousov
aaf78c16f5 Do not leak oldvmspace if image activation failed
and current address space is already destroyed, so kern_execve()
terminates the process.

While there, clean up some internals of post_execve() inlined in init_main.

Reported by:	Peter <pmc@citylink.dinoex.sub.org>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26525
2020-09-23 18:03:07 +00:00
Edward Tomasz Napierala
106a784b35 Reduce code duplication by introducing linux_copyout_sockaddr()
helper function.  No functional changes.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25804
2020-09-17 12:14:24 +00:00
Edward Tomasz Napierala
79e3da0602 Add support for SOUND_MIXER_WRITE_MONITOR ioctl. Fixes alsamixer(1)
on my x220.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25806
2020-09-17 11:44:45 +00:00
Edward Tomasz Napierala
70890254b3 Get rid of sv_errtbl and SV_ABI_ERRNO().
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26388
2020-09-17 11:39:33 +00:00
Mark Johnston
46888dedc9 Add emulation support for the Linux kcov(4) ioctl API.
This makes it possible to run an unmodified Linux syzkaller executor
against the Linuxulator, and have it gather code coverage information.

Sponsored by:	The FreeBSD Foundation
2020-09-04 00:12:28 +00:00
Mateusz Guzik
1a18003240 compat: clean up empty lines in .c and .h files 2020-09-01 21:24:33 +00:00
Mateusz Guzik
feabaaf995 cache: drop the always curthread argument from reverse lookup routines
Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs.

Tested by:	pho
2020-08-24 08:57:02 +00:00
Mateusz Guzik
7ad2a82da2 vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error
Most consumers pass NULL.
2020-08-19 02:51:17 +00:00
Mateusz Guzik
a125ed50a6 linux: add sysctl compat.linux.use_emul_path
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.

It defaults to on (== current behavior).

make -C /root/linux-5.3-rc8 -s -j 1 bzImage:

use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
2020-08-18 22:04:22 +00:00
Mark Johnston
a7044c60a5 Fix handling of ancillary data on non-AF_UNIX Linux sockets.
After r340674, the "continue" would restart the loop without having
updated clen, resulting in an infinite loop.  Restore the old behaviour
of simply ignoring all control messages on such sockets, since we
currently only implement handling for AF_UNIX-specific messages.

Reported by:	syzkaller
Reviewed by:	tijl
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26093
2020-08-18 14:17:14 +00:00
Mark Johnston
d9565182fd Remove "emulation" of clone(CLONE_PARENT | CLONE_THREAD).
On Linux this is supposed to result in EINVAL.

Reported by:	syzkaller
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-08-17 21:30:49 +00:00
Mark Johnston
74a796e0fc Fix a lock leak when emulating futex(FUTEX_WAIT_BITSET).
Reported by:	syzkaller
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-08-17 21:30:15 +00:00
Mark Johnston
30dcce2709 Skip Linux madvise(MADV_DONTNEED) on unmanaged objects.
vm_object_madvise() is a no-op for unmanaged objects, but we should also
limit the scope of mappings on which pmap_remove() is called.  In
particular, with the WIP largepage shm objects patch the kernel must
remove mappings of such objects along superpage boundaries, and without
this check Linux madvise(MADV_DONTNEED) could violate that requirement.

Reviewed by:	alc, kib
MFC with:	r362631
Sponsored by:	Juniper Networks, Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D26084
2020-08-17 17:14:56 +00:00
Edward Tomasz Napierala
aa75412146 Make linux(4) support the BLKPBSZGET ioctl. Oracle uses it.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25694
2020-07-19 12:25:03 +00:00