To help porting the Linux emulation layer to a new platforms start using
Linux names for conditional builds instead of architecture-specific ifdefs.
MFC after: 1 week
(cherry picked from commit 2a1cf1b6b55c8326bbe85d0fdf17b0f2fb9b34ce)
On Linux these system calls have an effect only when used in conjuction
with an I/O scheduler that supports I/O priorities. If no I/O scheduler
has been set for a thread, then by defaut the I/O priority will follow
the CPU nice value. Due to FreeBSD lack of I/O scheduler facilities, the
default Linux behavior is implemented.
Ubuntu 23.04 debootstrap requires Linux ionice which depends on these
syscalls.
Differential Revision: https://reviews.freebsd.org/D41153
MFC after: 1 month
The two main uses of dev_t are in struct stat and as a parameter of the
mknod system calls.
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit quantity
with 12 bits set asaid for the major number and 20 for the minor number.
The in-kernel dev_t encoded as MMMmmmmm, where M is a hex digit of the
major number and m is a hex digit of the minor number.
The user-space dev_t encoded as mmmM MMmm, where M and m is the major
and minor numbers accordingly. This is downward compatible with legacy
systems where dev_t is 16 bits wide, encoded as MMmm.
In glibc dev_t is a 64-bit quantity, with 32-bit major and minor numbers,
encoded as MMMM Mmmm mmmM MMmm. This is downward compatible with the Linux
kernel and with legacy systems where dev_t is 16 bits wide.
In the FreeBSD dev_t is a 64-bit quantity. The major and minor numbers
are encoded as MMMmmmMm, therefore conversion of the device numbers between
Linux user-space and FreeBSD kernel required.
To avoid confusing people, rename linux_timer.h to linux_time.h,
as linux_timer.c is the implementation of timer syscalls only,
while linux_time.c contains implementation of all stuff declared
in linux_time.h.
MFC after: 2 weeks
This obsolete system call is not supported by glibc. In ancient libc
versions (before glibc 2.0), uselib() was used to load the shared
libraries with names found in an array of names in the binary.
On Linux, since 3.15, this system call is available only when
the kernel is configured with the CONFIG_USELIB option.
It doesn't look like anyone needs this syscall for others Linuxulators,
so move it to the corresponding MD Linuxulator.
MFC after: 2 weeks
Include sys/sysent.h directly where it needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.
MFC after: 2 weeks
Include vm headers directly where they needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.
MFC after: 2 weeks
Split cpuset_getaffinity() into a two counterparts, where the
user_cpuset_getaffinity() is intended to operate on the cpuset_t from
user va, while kern_cpuset_getaffinity() expects the cpuset from kernel
va.
Accordingly, the code that clears the high bits is moved to the
user_cpuset_getaffinity(). Linux sched_getaffinity() syscall returns
the size of set copied to the user-space and then glibc wrapper clears
the high bits.
MFC after: 2 weeks
Rename the "copyin" and "copyout" fields of struct cpuset_copy_cb to
something less generic, since sanitizers define interceptors for
copyin() and copyout() using #define.
Reported by: syzbot+2db5d644097fc698fb6f@syzkaller.appspotmail.com
Fixes: 47a57144af ("cpuset: Byte swap cpuset for compat32 on big endian architectures")
Sponsored by: The FreeBSD Foundation
Linux has more tolerant checks of the user supplied cpuset_t's.
Minimum cpuset_t size that the Linux kernel permits in case of
getaffinity() is the maximum CPU id, present in the system / NBBY,
the maximum size is not limited.
For setaffinity(), Linux does not limit the size of the user-provided
cpuset_t, internally using only the meaningful part of the set, where
the upper bound is the maximum CPU id, present in the system, no larger
than the size of the kernel cpuset_t.
Unlike FreeBSD, Linux ignores high bits if set in the setaffinity(),
so clear it in the sched_setaffinity() and Linuxulator itself.
Reviewed by: Pau Amma (man pages)
In collaboration with: jhb
Differential revision: https://reviews.freebsd.org/D34849
MFC after: 2 weeks
There are many places where we copyin Linux timespec from the userspace
and then convert it to the kernel timespec. To avoid code duplication
add a tiny halper for doing this.
MFC after: 2 weeks
Assuming the kernel would use random data, the 64-bit Linux kernel ignores
upper 32 bits of tv_nsec of struct timespec64 for 32-bit binaries.
MFC after: 2 weeks
There are many places where we convert natvie timespec and copyout it to
the userspace. To avoid code duplication add a tiny halper for doing this.
MFC after: 2 weeks
Historically 32-bit Linuxulator under amd64 emulated the real i386
behavior. Since 3d8dd983 the old i386 Linux world can't be used under
amd64 Linuxulator as it don't know anything about amd64 machine (which
is returned now by newuname() syscall). So, add a knob to allow to swith
the behavior and use i386 Linux binaries on amd64.
Set knob to the new behavior as I think this is common to the modern
Linux distros.
Reviewed by: Pau Amma (doc), emaste
Differential revision: https://reviews.freebsd.org/D34708
MFC after: 2 weeks
Since Linux 5.4, if id is zero, then wait for any child that is in the same
process grop as the caller's process group.
Differential revision: https://reviews.freebsd.org/D31567
MFC after: 2 weeks
As FreeBSD does not have __WALL option bit analogue explicitly set all
possible option bits to emulate Linux __WALL wait option bit.
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D31555
MFC after: 2 weeks
Don't emit messages; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it. This is also similar with how we handle seccomp
in linux_prctl().
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D33808
Don't emit warnings; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it. This is also similar with how we handle seccomp
in linux_prctl().
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D33808
This moves linux_ptrace.c from sys/amd64/linux/ to sys/compat/linux/,
making it possible to use it on architectures other than amd64.
It also enables Linux ptrace(2) on arm64.
Relnotes: yes
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32868
This makes prctl(2) support PR_SET_NO_NEW_PRIVS, by mapping it
to the native PROC_NO_NEW_PRIVS_CTL procctl(2).
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30973