Split cpuset_getaffinity() into a two counterparts, where the
user_cpuset_getaffinity() is intended to operate on the cpuset_t from
user va, while kern_cpuset_getaffinity() expects the cpuset from kernel
va.
Accordingly, the code that clears the high bits is moved to the
user_cpuset_getaffinity(). Linux sched_getaffinity() syscall returns
the size of set copied to the user-space and then glibc wrapper clears
the high bits.
MFC after: 2 weeks
(cherry picked from commit d46174cd88)
As linux_execve is common across archs, except amd64 32-bit Linuxulator,
move it under compat/linux.
Noted by: andrew@
MFC after: 2 weeks
(cherry picked from commit 26700ac0c4)
Summary:
BITSET uses long as its basic underlying type, which is dependent on the
compile type, meaning on 32-bit builds the basic type is 32 bits, but on
64-bit builds it's 64 bits. On little endian architectures this doesn't
matter, because the LSB is always at the low bit, so the words get
effectively concatenated moving between 32-bit and 64-bit, but on
big-endian architectures it throws a wrench in, as setting bit 0 in
32-bit mode is equivalent to setting bit 32 in 64-bit mode. To
demonstrate:
32-bit mode:
BIT_SET(foo, 0): 0x00000001
64-bit sees: 0x0000000100000000
cpuset is the only system interface that uses bitsets, so solve this
by swapping the integer sub-components at the copyin/copyout points.
Reviewed by: kib
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D35225
(cherry picked from commit 47a57144af)
Fix the build after 47a57144
(cherry picked from commit 89737eb829)
cpuset: Fix the KASAN and KMSAN builds
Rename the "copyin" and "copyout" fields of struct cpuset_copy_cb to
something less generic, since sanitizers define interceptors for
copyin() and copyout() using #define.
Reported by: syzbot+2db5d644097fc698fb6f@syzkaller.appspotmail.com
Fixes: 47a57144af ("cpuset: Byte swap cpuset for compat32 on big endian architectures")
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 4a3e51335e)
Use Linux semantics for the thread affinity syscalls.
Linux has more tolerant checks of the user supplied cpuset_t's.
Minimum cpuset_t size that the Linux kernel permits in case of
getaffinity() is the maximum CPU id, present in the system / NBBY,
the maximum size is not limited.
For setaffinity(), Linux does not limit the size of the user-provided
cpuset_t, internally using only the meaningful part of the set, where
the upper bound is the maximum CPU id, present in the system, no larger
than the size of the kernel cpuset_t.
Unlike FreeBSD, Linux ignores high bits if set in the setaffinity(),
so clear it in the sched_setaffinity() and Linuxulator itself.
Reviewed by: Pau Amma (man pages)
In collaboration with: jhb
Differential revision: https://reviews.freebsd.org/D34849
MFC after: 2 weeks
(cherry picked from commit f35093f8d6)
There are many places where we copyin Linux timespec from the userspace
and then convert it to the kernel timespec. To avoid code duplication
add a tiny halper for doing this.
MFC after: 2 weeks
(cherry picked from commit 707e567a40)
Assuming the kernel would use random data, the 64-bit Linux kernel ignores
upper 32 bits of tv_nsec of struct timespec64 for 32-bit binaries.
MFC after: 2 weeks
(cherry picked from commit 1579b320f1)
There are many places where we convert natvie timespec and copyout it to
the userspace. To avoid code duplication add a tiny halper for doing this.
MFC after: 2 weeks
(cherry picked from commit 9a9482f874)
According to pselect6 manual, on error timeout becomes undefined, by fact
Linux modifies the timeout and ignore EFAULT error if so.
MFC after: 2 weeks
(cherry picked from commit 5171ed79f6)
Historically 32-bit Linuxulator under amd64 emulated the real i386
behavior. Since 3d8dd983 the old i386 Linux world can't be used under
amd64 Linuxulator as it don't know anything about amd64 machine (which
is returned now by newuname() syscall). So, add a knob to allow to swith
the behavior and use i386 Linux binaries on amd64.
Set knob to the new behavior as I think this is common to the modern
Linux distros.
Reviewed by: Pau Amma (doc), emaste
Differential revision: https://reviews.freebsd.org/D34708
MFC after: 2 weeks
(cherry picked from commit d5dc757e84)
Since Linux 5.4, if id is zero, then wait for any child that is in the same
process grop as the caller's process group.
Differential revision: https://reviews.freebsd.org/D31567
MFC after: 2 weeks
(cherry picked from commit 4e3aefb923)
As FreeBSD does not have __WALL option bit analogue explicitly set all
possible option bits to emulate Linux __WALL wait option bit.
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D31555
MFC after: 2 weeks
(cherry picked from commit c7ef7c3fac)
Also fix bug in waitid() implementation, use wru_self not wru_children.
Differential revision: https://reviews.freebsd.org/D31552
MFC after: 2 weeks
(cherry picked from commit 0c6b1ff7de)
Don't emit messages; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it. This is also similar with how we handle seccomp
in linux_prctl().
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D33808
(cherry picked from commit 99454d3e98)
This moves linux_ptrace.c from sys/amd64/linux/ to sys/compat/linux/,
making it possible to use it on architectures other than amd64.
It also enables Linux ptrace(2) on arm64.
Relnotes: yes
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32868
(cherry picked from commit a90ff3c4bc)
Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().
I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.
This is the first stage of Linuxulator's vdso revision.
Reviewed by: trasz, imp
Differential Revision: https://reviews.freebsd.org/D30774
MFC after: 2 weeks
(cherry picked from commit c1da89fec2)
For now the Linux emulation layer uses in kernel ppoll(2) without
conversion of user supplied fd 'events', and does not convert the
kernel supplied fd 'revents'.
At least POLLRDHUP is handled by FreeBSD differently than by
Linux. Seems that Linux silencly ignores POLLRDHUP on non socket fd's
unlike FreeBSD, which does more strictly check and fails.
Rework the Linux ppoll, using kern_poll and converting 'events'
and 'revents' values.
While here, move poll events defines to the MI part of code as they
mostly identical on all arches except arm.
Differential Revision: https://reviews.freebsd.org/D30716
MFC after: 2 weeks
(cherry picked from commit 26795a0378)
Rename the "copyin" and "copyout" fields of struct cpuset_copy_cb to
something less generic, since sanitizers define interceptors for
copyin() and copyout() using #define.
Reported by: syzbot+2db5d644097fc698fb6f@syzkaller.appspotmail.com
Fixes: 47a57144af ("cpuset: Byte swap cpuset for compat32 on big endian architectures")
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 4a3e51335e)
This makes prctl(2) support PR_SET_NO_NEW_PRIVS, by mapping it
to the native PROC_NO_NEW_PRIVS_CTL procctl(2).
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30973
(cherry picked from commit 2f514e6f13)
Move dtrace SDT definitions into linux_common module code. Also, build
linux_dummy.c into the linux_common kld -- we don't need separate
versions of these stubs for 32- and 64-bit emulation.
Reported by: several
PR: 250897
Discussed with: emaste, trasz
Tested by: John Kennedy, Yasuhiro KIMURA, Oleg Sidorkin
X-MFC-With: r367395
Differential Revision: https://reviews.freebsd.org/D27124
Proxy the flag to the roughly analogous FreeBSD procctl 'TRACE'.
TRACE-disabled processes are not coredumped, and Linux !DUMPABLE processes
can not be ptraced. There are some additional semantics around ownership of
files in the /proc/[pid] pseudo-filesystem, which we do not attempt to
emulate correctly at this time.
Reviewed by: markj (earlier version)
Differential Revision: https://reviews.freebsd.org/D27015
Turns out the dummy rlimits fix prlimit(1), but break su(8)
(login-1:4.5-1ubuntu2) - although not sudo(8), for some reason.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26814
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.
It defaults to on (== current behavior).
make -C /root/linux-5.3-rc8 -s -j 1 bzImage:
use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
PR: kern/240432
Analyzed by by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25248
- Use the same definition of free memory as Linux.
- Rename the totalbig and freebig fields to match the corresponding
names on Linux.
Discussed with: alc
MFC after: 1 week