Commit graph

98 commits

Author SHA1 Message Date
Mitchell Horne
d96eebfdb3 linux: populate sv_syscallnames in each sysentvec
This allows the syscallname() function to give a usable result for Linux
ABIs.

Reported by:	jrtc27
Reviewed by:	jrtc27, markj, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D37199

(cherry picked from commit 1da65dcb1c)
(cherry picked from commit f396f9b6c9)
2022-11-06 10:54:46 -04:00
Dmitry Chagin
c272720e2a linux(4): Properly build argument list for the signal handler
Provide arguments 2 and 3 if signal handler installed with SA_SIGINFO.

MFC after:		2 weeks

(cherry picked from commit 109fd18ad9)
2022-06-17 22:35:38 +03:00
Dmitry Chagin
fa6d9e24f4 linux(4): Microoptimize rt_sendsig(), convert signal mask once
On amd64 Linux saves the thread signal mask in both contexts, in the machine
dependent and in the machine independent. Both contexts are user accessible.
Convert the mask once, then copy it.

MFC after:		2 weeks

(cherry picked from commit c30a767c6f)
2022-06-17 22:35:38 +03:00
Dmitry Chagin
0f23fc29f9 linux(4): Avoid direct manipulation of td_sigmask
Use kern_sigprocmask() instead of direct manipulation of td_sigmask
to reschedule newly blocked signals.

MFC after:		2 weeks

(cherry picked from commit 2ab9b59faa)
2022-06-17 22:35:37 +03:00
Dmitry Chagin
06f6414a64 linux(4): Deduplicate bsd_to_linux_trapcode()
As bsd_to_linux_trapcode() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks

(cherry picked from commit 9016ec056a)
2022-06-17 22:35:28 +03:00
Dmitry Chagin
f327f595cb linux(4): Deduplicate translate_traps()
As translate_traps() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks

(cherry picked from commit 2434137f69)
2022-06-17 22:35:28 +03:00
Dmitry Chagin
3cf95e49cb Retire sv_transtrap
Call translate_traps directly from sendsig().

MFC after:		2 weeks

(cherry picked from commit eca368ecb6)
2022-06-17 22:35:27 +03:00
Dmitry Chagin
c2704d3780 linux(4): Better naming for ucontext field of struct rt_sigframe
To reduce sendsig code difference and to avoid confusing me,
rename sf_sc to sf_uc to match the content.

MFC after:		2 weeks

(cherry picked from commit 6e826d27c3)
2022-06-17 22:35:21 +03:00
Dmitry Chagin
3f3bfb8266 linux(4): Move sigframe definitions to separate headers
The signal trampoine-related definitions are used only in the MD part
of code, wherefore moved from everywhere used linux.h to separate MD
headers.

MFC after:		2 weeks

(cherry picked from commit 21f2461741)
2022-06-17 22:35:20 +03:00
Dmitry Chagin
0fae97fd1c linux(4): Cleanup signal trampolines
This is the first stage of a signal trampolines refactoring.

From trampolines retired emulation of the 'call' instruction, which is
replaced by direct call of a signal handler. The signal handler address
is in the register.

The previous trampoline implemenatation used semi-Linux-way to call
a signal handler via the 'jmp' instruction. Wherefore the trampoline
emulated a 'call' instruction to into the stack the return address for
signal handler's 'ret' instruction.  Wherefore handmade DWARD annotations
was used.

While here rephrased and removed excessive comments.

MFC after:		2 weeks

(cherry picked from commit ba279bcd6d)
2022-06-17 22:35:19 +03:00
Dmitry Chagin
e7a07ad1db linux(4): Implement vdso getcpu for x86.
This is modeled after f2395455 (by kib@).

MFC after:		2 weeks

(cherry picked from commit 5a6a4fb284)
2022-06-17 22:35:00 +03:00
Edward Tomasz Napierala
feb67d438b linux(4): Reduce diffs between linux_rt_sendsig() and sendsig()
No functional changes (except for the uprintf).

Discussed With:	kib
Sponsored By:	EPSRC

(cherry picked from commit f7b04c53de)
2022-06-17 22:33:41 +03:00
Dmitry Chagin
a1e6526440 linux(4): Remove the unnecessary spaces.
MFC after:		2 weeks

(cherry picked from commit bed2ac27a1)
2022-06-17 22:33:33 +03:00
Dmitry Chagin
8fb3f959dc linux(4): Add struct clone_args for future clone3 system call.
In preparation for clone3 system call add struct clone_args and use it in
clone implementation.
Move all of clone related bits to the newly created linux_fork.h header.

Differential revision:	https://reviews.freebsd.org/D31474
MFC after:		2 weeks

(cherry picked from commit 0a4b664ae8)
2022-06-17 22:33:30 +03:00
Dmitry Chagin
5a66ec2748 fork: Allow ABI to specify fork return values for child.
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().

Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.

Reviewed by:            kib
Differential revision:  https://reviews.freebsd.org/D31472
MFC after:              2 weeks

(cherry picked from commit de8374df28)
2022-06-17 22:33:28 +03:00
Dmitry Chagin
6698c1a28d linux(4): Allow musl brand to use FUTEX_REQUEUE op.
Initial patch from submitter was adapted by me to prevent unconditional
FUTEX_REQUEUE use.

PR:			255947
Submitted by:		Philippe Michaud-Boudreault
Differential Revision:	https://reviews.freebsd.org/D30332

(cherry picked from commit cf8d74e3fe)
2022-06-17 22:33:12 +03:00
Dmitry Chagin
2e084b4b54 Drop "All rights reserved" from my copyright statements.
Add email and fixup years while here.

Reviewed by:		imp
Differential Revision:	https://reviews.freebsd.org/D30912
MFC after:		2 weeks

(cherry picked from commit 1ca6b15bbd)
2022-06-17 22:33:09 +03:00
Dmitry Chagin
896a4b7c79 linux(4): Add arch name to the some printfs.
Reviewed by:		emaste
Differential revision:	https://reviews.freebsd.org/D30904
MFC after:		2 weeks

(cherry picked from commit ae8330b448)
2022-06-17 22:33:08 +03:00
Dmitry Chagin
4e34c0445c linux(4): Fixup the vDSO initialization order.
The vDSO initialisation order should be as follows:
- native abi init via exec_sysvec_init();
- vDSO symbols queued to the linux_vdso_syms list;
- linux_vdso_install();
- linux_exec_sysvec_init();

As the exec_sysvec_init() called with SI_ORDER_ANY (last) at SI_SUB_EXEC
order, move linux_vdso_install() and linux_exec_sysvec_init() to the
SI_SUB_EXEC+1 order.

Reviewed by:		trasz
Differential Revision:	https://reviews.freebsd.org/D30902
MFC after		2 weeks

(cherry picked from commit 09cffde975)
2022-06-17 22:33:07 +03:00
Dmitry Chagin
b960dd4815 linux(4): Constify vdso install/deinstall.
In order to reduce diff between arches constify vdso install/deinstall
functions like arm64.

Reviewed by:		emaste
Differential revision:	https://reviews.freebsd.org/D30901
MFC after:		2 weeks

(cherry picked from commit a543556c81)
2022-06-17 22:33:07 +03:00
Dmitry Chagin
a340b5b4bd linux(4); Almost complete the vDSO.
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.

The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
  implemented in the vDSO: for now gettimeofday, clock_gettime.

The first two have been implemented, so add the implementation of system
calls.

System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
  can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).

In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.

Relnotes:		yes
Tested by:              trasz (arm64)
Differential revision:  https://reviews.freebsd.org/D30900
MFC after:              2 weeks

(cherry picked from commit 9931033bbf)
2022-06-17 22:33:07 +03:00
Dmitry Chagin
54689a282a linux(4): Modify sv_onexec hook to return an error.
Temporary add stubs to the Linux emulation layer which calls the existing hook.

Reviewed by:            kib
Differential Revision:  https://reviews.freebsd.org/D30911
MFC after:              2 weeks

(cherry picked from commit 5fd9cd53d2)
2022-06-17 22:33:05 +03:00
Edward Tomasz Napierala
1b196c07b8 linux(4): implement coredump support
Implement dumping core for Linux binaries on amd64, for both
32- and 64-bit executables.  Some bits are still missing.

This is based on a prototype by chuck@.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30019

(cherry picked from commit 447636e43c)
2022-06-17 22:33:02 +03:00
Dmitry Chagin
0cd177a3d3 linux(4): Retire linux_kplatform.
Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().

I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.

This is the first stage of Linuxulator's vdso revision.

Reviewed by:            trasz, imp
Differential Revision:  https://reviews.freebsd.org/D30774
MFC after:              2 weeks

(cherry picked from commit c1da89fec2)
2022-06-17 22:30:23 +03:00
Edward Tomasz Napierala
b0774c8b90 linux: fix sigaltstack on amd64
To determine whether to use alternate signal stack or not,
we need to use the native signal number, not the one translated
with bsd_to_linux_signal().

In practical terms, this fixes golang.

Reviewed By:	dchagin
Fixes:		135dd0cab5
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D31298

(cherry picked from commit b54838003c)
2022-06-17 22:30:21 +03:00
Edward Tomasz Napierala
e056a23793 linux: reduce differences between rt_sendsig() and sendsig()
This makes it easier to compare the two.  This involves moving
the mutex slightly lower down, but there should be no functional
changes.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30541

(cherry picked from commit 135dd0cab5)
2022-06-17 22:30:20 +03:00
Dmitry Chagin
27c31bdac5 linux(4): optimize ksiginfo to siginfo conversion.
Retire ksiginfo_to_lsiginfo function, use siginfo_to_lsiginfo instead.
Convert rt_sigtimedwait siginfo variables to well known names.

MFC after:	2 weeks

(cherry picked from commit f4e801085b)
2022-06-17 22:28:00 +03:00
Edward Tomasz Napierala
f707a6cf1a Add infrastructure required for Linux coredump support
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values.  It also updates all
of the ABI definitions to preserve current behaviour.

This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication.  It will be used
for Linux coredumps.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30921

(cherry picked from commit 435754a59e)
2022-05-12 15:12:59 -07:00
Edward Tomasz Napierala
187a635352 linux: adjust ordering of Linux auxv and add dummy AT_HWCAP2
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.

Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'.  This matches what
I've observed under Ubuntu Focal VM.

Reviewed By:	chuck, dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29609

(cherry picked from commit ca6e1fa3ce)
2022-02-13 21:06:37 +00:00
Mark Johnston
947e849150 sysent: Add a sv_psstringssz field to struct sysentvec
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 3fc21fdd5f)
2022-01-31 09:48:11 -05:00
Mark Johnston
d247611467 exec: Introduce the PROC_PS_STRINGS() macro
Rather than fetching the ps_strings address directly from a process'
sysentvec, use this macro.  With stack address randomization the
ps_strings address is no longer fixed.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 706f4a81a8)
2022-01-31 09:46:57 -05:00
Mark Johnston
1562fe492a exec: Simplify sv_copyout_strings implementations a bit
Simplify control flow around handling of the execpath length and signal
trampoline.  Cache the sysentvec pointer in a local variable.

No functional change intended.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit f04a096049)
2022-01-14 08:50:06 -05:00
Konstantin Belousov
682f5b9385 amd64: consistently use uprintf() to report weird situations in sigreturn
(cherry picked from commit 2e79a21632)
2021-10-10 12:21:17 +03:00
Konstantin Belousov
61b29720ef amd64: centralize definitions of CS_SECURE and EFL_SECURE
(cherry picked from commit a42d362bb5)
2021-10-10 12:21:17 +03:00
Konstantin Belousov
52d8029e93 Add quirks for Linux ABI signals handling
(cherry picked from commit 870e197d52)
2021-06-22 04:45:32 +03:00
Konstantin Belousov
dc107fe1f9 linuxolator: Add compat.linux.setid_allowed knob
PR:	21463

(cherry picked from commit 598f6fb49c)
2021-06-13 04:22:33 +03:00
Konstantin Belousov
6c74b12295 amd64 linux64: use x86_clear_dbregs()
(cherry picked from commit 2f15884747)
2021-04-23 14:14:08 +03:00
Mark Johnston
3df2766a69 linux: Unmap the VDSO page when unloading
linux_shared_page_init() creates an object and grabs and maps a single
page to back the VDSO.  When destroying the VDSO object, we failed to
destroy the mapping and free KVA.  Fix this.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28696

(cherry picked from commit 0fc8a79672)
2021-02-22 20:29:55 -05:00
Konstantin Belousov
4815f175d0 Linuxolator: Replace use of eventhandlers by sysent hooks.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27309
2020-11-23 18:18:16 +00:00
Edward Tomasz Napierala
866b1f5147 Fix misnomer - linux_to_bsd_errno() does the exact opposite.
Reported by:	arichardson
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26965
2020-10-27 12:49:40 +00:00
Edward Tomasz Napierala
6221ec6064 Stop calling set_syscall_retval() from linux_set_syscall_retval().
The former clobbers some registers that shouldn't be touched.

Reviewed by:	kib (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26406
2020-10-18 16:16:22 +00:00
Edward Tomasz Napierala
1e2521ffae Get rid of sa->narg. It serves no purpose; use sa->callp->sy_narg instead.
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26458
2020-09-27 18:47:06 +00:00
Edward Tomasz Napierala
70890254b3 Get rid of sv_errtbl and SV_ABI_ERRNO().
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26388
2020-09-17 11:39:33 +00:00
Edward Tomasz Napierala
c26391f4dd Move SV_ABI_ERRNO translation into linux-specific code, to simplify
the syscall path and declutter it a bit.  No functional changes intended.

Reviewed by:	kib (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26378
2020-09-15 16:41:21 +00:00
Konstantin Belousov
9ce875d9b5 amd64 pmap: LA57 AKA 5-level paging
Since LA57 was moved to the main SDM document with revision 072, it
seems that we should have a support for it, and silicons are coming.

This patch makes pmap support both LA48 and LA57 hardware.  The
selection of page table level is done at startup, kernel always
receives control from loader with 4-level paging.  It is not clear how
UEFI spec would adapt LA57, for instance it could hand out control in
LA57 mode sometimes.

To switch from LA48 to LA57 requires turning off long mode, requesting
LA57 in CR4, then re-entering long mode.  This is somewhat delicate
and done in pmap_bootstrap_la57().  AP startup in LA57 mode is much
easier, we only need to toggle a bit in CR4 and load right value in CR3.

I decided to not change kernel map for now.  Single PML5 entry is
created that points to the existing kernel_pml4 (KML4Phys) page, and a
pml5 entry to create our recursive mapping for vtopte()/vtopde().
This decision is motivated by the fact that we cannot overcommit for
KVA, so large space there is unusable until machines start providing
wider physical memory addressing.  Another reason is that I do not
want to break our fragile autotuning, so the KVA expansion is not
included into this first step.  Nice side effect is that minidumps are
compatible.

On the other hand, (very) large address space is definitely
immediately useful for some userspace applications.

For userspace, numbering of pte entries (or page table pages) is
always done for 5-level structures even if we operate in 4-level mode.
The pmap_is_la57() function is added to report the mode of the
specified pmap, this is done not to allow simultaneous 4-/5-levels
(which is not allowed by hw), but to accomodate for EPT which has
separate level control and in principle might not allow 5-leve EPT
despite x86 paging supports it. Anyway, it does not seems critical to
have 5-level EPT support now.

Tested by:	pho (LA48 hardware)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:19:04 +00:00
Mark Johnston
0cfac4d5c6 Handle getcpu() calls in vsyscall emulation on amd64.
linux_getcpu() has been implemented since r356241.

PR:		246339
Submitted by:	John Hay <john@sanren.ac.za>
MFC after:	1 week
2020-05-31 18:20:20 +00:00
Brooks Davis
b24e6ac8b7 Convert canary, execpathp, and pagesizes to pointers.
Use AUXARGS_ENTRY_PTR to export these pointers.  This is a followup to
r359987 and r359988.

Reviewed by:	jhb
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24446
2020-04-16 21:53:17 +00:00
Edward Tomasz Napierala
b5f20658ee Add compat.linux.emul_path, so it can be set to something other
than "/compat/linux".  Useful when you have several compat directories
with different Linux versions and you don't want to clash with files
installed by linux-c7 packages.

Reviewed by:	bcr (manpages)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22574
2019-12-16 20:07:04 +00:00
John Baldwin
d8010b1175 Copy out aux args after the argument and environment vectors.
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector.  Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack.  It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.

This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).

Reviewed by:	kib
Tested on:	amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22695
2019-12-09 19:17:28 +00:00
John Baldwin
31174518d2 Use uintptr_t instead of register_t * for the stack base.
- Use ustringp for the location of the argv and environment strings
  and allow destp to travel further down the stack for the stackgap
  and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
  stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs.  This
  used to hold translated system call arguments, but hasn't been used
  since r159992.

Reviewed by:	kib
Tested on:	md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22501
2019-12-03 23:17:54 +00:00