Commit graph

198 commits

Author SHA1 Message Date
Dmitry Chagin
c9ec2fb86c linux(4): Drop the outdated comments about sixth register on i386 int0x80
This is well documented in the Linux syscall(2).

MFC after:		1 week

(cherry picked from commit 5bdd74cc05e6c7d110688feacdbd22b6dffe5d72)
2023-10-18 08:52:33 +03:00
Konstantin Belousov
9a077205ca linuxolator: fix nosys() to not send SIGSYS
(cherry picked from commit 7acc4240ce00af540093b47ad00be0508310b515)
2023-10-09 06:24:31 +03:00
Dmitry Chagin
cba7b3b956 linux(4): Cleanup includes under amd64/linux32
No functional changes.

MFC after:		1 week

(cherry picked from commit ba90a31d08e413b82c1d1d4d697c2cdb32a13674)
2023-09-24 13:51:36 +03:00
Dmitry Chagin
3460fab5fc linux(4): Remove sys/cdefs.h inclusion where it's not needed due to 685dc743 2023-08-18 13:12:02 +03:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Dmitry Chagin
4281dab8bc linux(4): Add elf_hwcap2 to x86
On x86 Linux via AT_HWCAP2 the user controlled (by tunables) processor
capabilities are exposed.

Reviewed by:
Differential Revision:	https://reviews.freebsd.org/D41165
MFC after:		2 weeks
2023-07-28 11:56:59 +03:00
Dmitry Chagin
fd745e1db6 linux(4): Use pwd_altroot() to tell namei() about ABI root path
PR:			72920
Differential Revision:	https://reviews.freebsd.org/D40090
MFC after:		2 month
2023-05-29 11:16:46 +03:00
Dmitry Chagin
7d8c983983 linux(4): Deduplicate linux_copyout_auxargs()
Export default MINSIGSTKSZ value for the x86 until we do not preserve AVX
registers in the signal context.

Differential Revision:	https://reviews.freebsd.org/D39644
MFC after:		1 month
2023-04-22 22:16:02 +03:00
Dmitry Chagin
2456a45929 linux(4): Cleanup includes under amd64/linux
Cleanup unneeded includes, sort the rest according to style(9).
No functional changes.

MFC after:		2 weeks
2023-02-14 17:46:32 +03:00
Dmitry Chagin
10d16789a3 linux(4): Get rid of the opt_compat.h include.
Since e013e369 COMPAT_LINUX, COMPAT_LINUX32 build options are removed,
so include of opt_compat.h is no more needed.

MFC after:		2 weeks
2023-02-12 20:24:32 +03:00
Dmitry Chagin
95b8603427 linux(4): Deduplicate linux_trans_osrel().
MFC after:		1 week
2023-02-02 17:58:07 +03:00
Dmitry Chagin
9e550625f8 linux(4): Deduplicate linux_fixup_elf().
Use native routines to fixup initial process stack. On Arm64 linux_elf_fixup() is
noop, as it do the stack fixup (room for argc) in the linux_copyout_strings().

MFC after:		1 week
2023-02-02 17:58:07 +03:00
Dmitry Chagin
7446514533 linux(4): Microoptimize linux_elf.h for future use.
In order to reduce code duplication move coredump support definitions
into the appropriate header and hide private definitions.

MFC after:		1 week
2023-02-02 17:58:06 +03:00
Mitchell Horne
1da65dcb1c linux: populate sv_syscallnames in each sysentvec
This allows the syscallname() function to give a usable result for Linux
ABIs.

Reported by:	jrtc27
Reviewed by:	jrtc27, markj, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D37199
2022-10-28 18:21:08 -03:00
Kornel Dulęba
361971fbca Rework how shared page related data is stored
Store the shared page address in struct vmspace.
Also instead of storing absolute addresses of various shared page
segments save their offsets with respect to the shared page address.
This will be more useful when the shared page address is randomized.

Approved by:	mw(mentor)
Sponsored by:	Stormshield
Obtained from:	Semihalf
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D35393
2022-07-18 16:27:32 +02:00
Dmitry Chagin
4a6c2d075d linux(4): Properly restore the thread signal mask after signal delivery on i386
Replace sigframe sf_extramask by native sigset_t and use it to
store/restore the thread signal mask without conversion to/from
Linux signal mask.

Pointy hat to:		dchagin
MFC after:		2 weeks
2022-05-30 20:03:49 +03:00
Dmitry Chagin
9016ec056a linux(4): Deduplicate bsd_to_linux_trapcode()
As bsd_to_linux_trapcode() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks
2022-05-23 13:16:58 +03:00
Dmitry Chagin
2434137f69 linux(4): Deduplicate translate_traps()
As translate_traps() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks
2022-05-23 13:16:26 +03:00
Dmitry Chagin
eca368ecb6 Retire sv_transtrap
Call translate_traps directly from sendsig().

MFC after:		2 weeks
2022-05-20 14:54:03 +03:00
Dmitry Chagin
8f9635dc99 linux(4): Retire handmade DWARF annotations from signal trampolines
The Linux exports __kernel_sigreturn and __kernel_rt_sigreturn from the
vdso. Modern glibc's sigaction sets the sa_restorer field of sigaction
to the corresponding vdso __sigreturn, and sets the SA_RESTORER.
Our signal trampolines uses the FreeBSD-way to call a signal handler,
so does not use the sigaction's sa_restorer.

However, as glibc's runtime linker depends on the existment of the vdso
__sigreturn symbols, for all Linuxulators was added separate trampolines
named __sigcode with DWARF anotations and left separate __sigreturn
methods, which are exported.

MFC after:		2 weeks
2022-05-15 21:08:12 +03:00
Dmitry Chagin
6e826d27c3 linux(4): Better naming for ucontext field of struct rt_sigframe
To reduce sendsig code difference and to avoid confusing me,
rename sf_sc to sf_uc to match the content.

MFC after:		2 weeks
2022-05-15 21:06:47 +03:00
Dmitry Chagin
21f2461741 linux(4): Move sigframe definitions to separate headers
The signal trampoine-related definitions are used only in the MD part
of code, wherefore moved from everywhere used linux.h to separate MD
headers.

MFC after:		2 weeks
2022-05-15 21:03:01 +03:00
Dmitry Chagin
ba279bcd6d linux(4): Cleanup signal trampolines
This is the first stage of a signal trampolines refactoring.

From trampolines retired emulation of the 'call' instruction, which is
replaced by direct call of a signal handler. The signal handler address
is in the register.

The previous trampoline implemenatation used semi-Linux-way to call
a signal handler via the 'jmp' instruction. Wherefore the trampoline
emulated a 'call' instruction to into the stack the return address for
signal handler's 'ret' instruction.  Wherefore handmade DWARD annotations
was used.

While here rephrased and removed excessive comments.

MFC after:		2 weeks
2022-05-15 21:00:05 +03:00
Dmitry Chagin
0b5d5dc376 linux(4): Retire unneeded initialization
Both uc_flags and uc_link are zeroed above. On amd64 and i386 the
uc_link field is not used at all. The UC_FP_XSTATE bit should be set
in the uc_flags if OS xsave knob is turned on (and xsave is implemented).

MFC after:		2 weeks
2022-05-15 20:58:47 +03:00
Dmitry Chagin
5a6a4fb284 linux(4): Implement vdso getcpu for x86.
This is modeled after f2395455 (by kib@).

MFC after:		2 weeks
2022-05-08 17:20:52 +03:00
Dmitry Chagin
d5dc757e84 linux(4): Add compat.linux32.emulate_i386 knob.
Historically 32-bit Linuxulator under amd64 emulated the real i386
behavior. Since 3d8dd983 the old i386 Linux world can't be used under
amd64 Linuxulator as it don't know anything about amd64 machine (which
is returned now by newuname() syscall). So, add a knob to allow to swith
the behavior and use i386 Linux binaries on amd64.
Set knob to the new behavior as I think this is common to the modern
Linux distros.

Reviewed by:		Pau Amma (doc), emaste
Differential revision:	https://reviews.freebsd.org/D34708
MFC after:		2 weeks
2022-03-31 21:01:09 +03:00
John Baldwin
6bea696af2 linux_copyout_strings: Use PROC_PS_STRINGS().
Reviewed by:	markj
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D34173
2022-02-04 15:57:57 -08:00
Mark Johnston
3fc21fdd5f sysent: Add a sv_psstringssz field to struct sysentvec
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 11:42:07 -05:00
Mark Johnston
f04a096049 exec: Simplify sv_copyout_strings implementations a bit
Simplify control flow around handling of the execpath length and signal
trampoline.  Cache the sysentvec pointer in a local variable.

No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33703
2021-12-31 12:50:15 -05:00
Konstantin Belousov
a42d362bb5 amd64: centralize definitions of CS_SECURE and EFL_SECURE
Requested by	markj
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:14 +03:00
Dmitry Chagin
0a4b664ae8 linux(4): Add struct clone_args for future clone3 system call.
In preparation for clone3 system call add struct clone_args and use it in
clone implementation.
Move all of clone related bits to the newly created linux_fork.h header.

Differential revision:	https://reviews.freebsd.org/D31474
MFC after:		2 weeks
2021-08-12 11:49:01 +03:00
Dmitry Chagin
de8374df28 fork: Allow ABI to specify fork return values for child.
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().

Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.

Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D31472
MFC after:		2 weeks
2021-08-12 11:45:25 +03:00
Dmitry Chagin
cf8d74e3fe linux(4): Allow musl brand to use FUTEX_REQUEUE op.
Initial patch from submitter was adapted by me to prevent unconditional
FUTEX_REQUEUE use.

PR:			255947
Submitted by:		Philippe Michaud-Boudreault
Differential Revision:	https://reviews.freebsd.org/D30332
2021-07-20 14:39:20 +03:00
Dmitry Chagin
ae8330b448 linux(4): Add arch name to the some printfs.
Reviewed by:		emaste
Differential revision:	https://reviews.freebsd.org/D30904
MFC after:		2 weeks
2021-07-20 10:05:08 +03:00
Dmitry Chagin
09cffde975 linux(4): Fixup the vDSO initialization order.
The vDSO initialisation order should be as follows:
- native abi init via exec_sysvec_init();
- vDSO symbols queued to the linux_vdso_syms list;
- linux_vdso_install();
- linux_exec_sysvec_init();

As the exec_sysvec_init() called with SI_ORDER_ANY (last) at SI_SUB_EXEC
order, move linux_vdso_install() and linux_exec_sysvec_init() to the
SI_SUB_EXEC+1 order.

Reviewed by:		trasz
Differential Revision:	https://reviews.freebsd.org/D30902
MFC after		2 weeks
2021-07-20 10:02:34 +03:00
Dmitry Chagin
a543556c81 linux(4): Constify vdso install/deinstall.
In order to reduce diff between arches constify vdso install/deinstall
functions like arm64.

Reviewed by:		emaste
Differential revision:	https://reviews.freebsd.org/D30901
MFC after:		2 weeks
2021-07-20 10:01:47 +03:00
Dmitry Chagin
9931033bbf linux(4); Almost complete the vDSO.
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.

The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
  implemented in the vDSO: for now gettimeofday, clock_gettime.

The first two have been implemented, so add the implementation of system
calls.

System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
  can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).

In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.

Tested by:		trasz (arm64)
Differential revision:  https://reviews.freebsd.org/D30900
MFC after:              2 weeks
2021-07-20 10:01:18 +03:00
Dmitry Chagin
5fd9cd53d2 linux(4): Modify sv_onexec hook to return an error.
Temporary add stubs to the Linux emulation layer which calls the existing hook.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D30911
MFC after:		2 weeks
2021-07-20 09:56:25 +03:00
David Chisnall
cf98bc28d3 Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-16 18:06:44 +01:00
David Chisnall
d2b558281a Revert "Pass the syscall number to capsicum permission-denied signals"
This broke the i386 build.

This reverts commit 3a522ba1bc.
2021-07-10 20:26:01 +01:00
David Chisnall
3a522ba1bc Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-10 17:19:52 +01:00
Edward Tomasz Napierala
447636e43c linux(4): implement coredump support
Implement dumping core for Linux binaries on amd64, for both
32- and 64-bit executables.  Some bits are still missing.

This is based on a prototype by chuck@.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30019
2021-06-30 22:45:06 +01:00
Edward Tomasz Napierala
435754a59e Add infrastructure required for Linux coredump support
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values.  It also updates all
of the ABI definitions to preserve current behaviour.

This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication.  It will be used
for Linux coredumps.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30921
2021-06-29 08:49:12 +01:00
Dmitry Chagin
4aae133469 linux(4): Make vDSO defines private.
Hide the vDSO defines to the linux32_sysvec as they are not intended to
be used outside of it. Fix LINUX32_PS_STRINGS, use the size of
struct linux32_ps_strings instead of a numeric constant.

MFC after:	2 weeks
2021-06-25 18:41:04 +03:00
Dmitry Chagin
c1da89fec2 linux(4): Retire linux_kplatform.
Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().

I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.

This is the first stage of Linuxulator's vdso revision.

Reviewed by:		trasz, imp
Differential Revision:	https://reviews.freebsd.org/D30774
MFC after:		2 weeks
2021-06-22 08:36:21 +03:00
Konstantin Belousov
870e197d52 Add quirks for Linux ABI signals handling
Require queueing of the signals with default action, and disable
dequeueing SIGCHLD on wait for live process.

Reported and tested by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:01:35 +03:00
Konstantin Belousov
598f6fb49c linuxolator: Add compat.linux.setid_allowed knob
PR:	21463
Reported by:	kris
Reviewed by:	dchagin
Tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28154
2021-06-06 21:43:00 +03:00
Dmitry Chagin
f4e801085b linux(4): optimize ksiginfo to siginfo conversion.
Retire ksiginfo_to_lsiginfo function, use siginfo_to_lsiginfo instead.
Convert rt_sigtimedwait siginfo variables to well known names.

MFC after:	2 weeks
2021-06-07 06:06:17 +03:00
Edward Tomasz Napierala
ca6e1fa3ce linux: adjust ordering of Linux auxv and add dummy AT_HWCAP2
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.

Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'.  This matches what
I've observed under Ubuntu Focal VM.

Reviewed By:	chuck, dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29609
2021-04-13 13:14:30 +01:00
Konstantin Belousov
94172affa4 amd64: clear debug registers on execing 32bit Linux binary
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00