Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1(). This change depends on the pwddesc split
introduced in r367777.
I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag). This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.
In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.
The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).
Discussed with: markj, trasz (earlier version)
Differential Revision: https://reviews.freebsd.org/D27016
As this ABI is still fresh (r367287), let's correct some mistakes now:
- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess
Discussed with: kib, markj, trasz
Differential Revision: https://reviews.freebsd.org/D27084
from a Linux binary. Should come handy for AppImages.
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26959
- map those IPv4 / IPv6 socket options which exist in FreeBSD
+ most of them visually verified to have the same type/layout of arguments
+ not tested with linux programs to behave as intended
- be more human readable for known options which are not handled
- be more verbose for unhandled socket message flags we know about
- print the jail ID in linux_msg if run in a jail
- add possibility to print debug message about known missing parts only once
- add multiple levels of sysctl linux.debug:
1: print debug messages, tell about unimplemented stuff (only once)
2: like 1, but also print messages about implemented but not tested
stuff (only once)
3+: like 2, but no rate limiting of messages
- increase default linux debug level from 1 to 3
We are a lot more verbose in as we need to be (e.g. some of the IP socket
options which are the same, and share the same memory layout, and are
believed to work). The reason is that we have no good testsuite to test those
linux-bits. The LTP or other test suites like the python one, are not fully
up to the task we need. As such the excessive messages about emulated but not
tested socket options.
IMO any MFC (possible, but most probably not by me) should set the default
debug level to 1.
Discussed with: trasz
Move dtrace SDT definitions into linux_common module code. Also, build
linux_dummy.c into the linux_common kld -- we don't need separate
versions of these stubs for 32- and 64-bit emulation.
Reported by: several
PR: 250897
Discussed with: emaste, trasz
Tested by: John Kennedy, Yasuhiro KIMURA, Oleg Sidorkin
X-MFC-With: r367395
Differential Revision: https://reviews.freebsd.org/D27124
This will be used by fuse(4).
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26974
Add some missing netlink_family definitions and produce vaguely
human-readable error messages for those definitions, like we used to do for
just ROUTE and KOBJECT_UEVENTS.
Additionally, if we know it's a netfilter socket but didn't find it in the
table, fall back to printing that instead of the generic handler ("socket
domain 16, ...").
No change to the emulator correctness, just mildly improved diagnostics for
gaps.
Proxy the flag to the roughly analogous FreeBSD procctl 'TRACE'.
TRACE-disabled processes are not coredumped, and Linux !DUMPABLE processes
can not be ptraced. There are some additional semantics around ownership of
files in the /proc/[pid] pseudo-filesystem, which we do not attempt to
emulate correctly at this time.
Reviewed by: markj (earlier version)
Differential Revision: https://reviews.freebsd.org/D27015
This is required by some major linux applications, such as Chrome and
Firefox. (As well as Electron-using applications, which are essentially
a bundled version of Chrome.)
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27012
On Linux, sqlite probes for underlying F2FS filesystems that support
certain kinds of atomic update with this ioctl. The expected result on
non-F2FS filesystem (i.e., all FreeBSD filesystems) is any error value.
Minimally implement the ioctl and avoid the warning message.
(This shows up in Linux Chrome, which embeds sqlite.)
Reviewed by: emaste, trasz
Differential Revision: https://reviews.freebsd.org/D27050
more readable. While here, add linux_check_errtbl() function to make
sure we don't leave holes.
No objections: emaste (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26972
Turns out the dummy rlimits fix prlimit(1), but break su(8)
(login-1:4.5-1ubuntu2) - although not sudo(8), for some reason.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26814
defaults, makes core files smaller, and fixes applications which use
pthread_join(3) in a wrong way, namely Steam.
This is based on a patch submitted by Jason Yang, which I've reworked
to set the limit instead of only changing the value reported (which
is enough to fix the bug for Linux pthreads, but could be confusing).
PR: 248225
Submitted by: Jason_YH_Yang at wistron.com (earlier version)
Analyzed by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26778
and current address space is already destroyed, so kern_execve()
terminates the process.
While there, clean up some internals of post_execve() inlined in init_main.
Reported by: Peter <pmc@citylink.dinoex.sub.org>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26525
This makes it possible to run an unmodified Linux syzkaller executor
against the Linuxulator, and have it gather code coverage information.
Sponsored by: The FreeBSD Foundation
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.
It defaults to on (== current behavior).
make -C /root/linux-5.3-rc8 -s -j 1 bzImage:
use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
After r340674, the "continue" would restart the loop without having
updated clen, resulting in an infinite loop. Restore the old behaviour
of simply ignoring all control messages on such sockets, since we
currently only implement handling for AF_UNIX-specific messages.
Reported by: syzkaller
Reviewed by: tijl
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26093
vm_object_madvise() is a no-op for unmanaged objects, but we should also
limit the scope of mappings on which pmap_remove() is called. In
particular, with the WIP largepage shm objects patch the kernel must
remove mappings of such objects along superpage boundaries, and without
this check Linux madvise(MADV_DONTNEED) could violate that requirement.
Reviewed by: alc, kib
MFC with: r362631
Sponsored by: Juniper Networks, Klara Inc.
Differential Revision: https://reviews.freebsd.org/D26084