Enhanced REP MOVSB feature of CPUs starting from Ivy Bridge makes
REP MOVSB the fastest way to copy memory in most of cases. However
Intel Optimization Reference Manual says: "setting the DF to force
REP MOVSB to copy bytes from high towards low addresses will expe-
rience significant performance degradation". Measurements on Intel
Cascade Lake and Alder Lake, same as on AMD Zen3 show that it can
drop throughput to as low as 2.5-3.5GB/s, comparing to ~10-30GB/s
of REP MOVSQ or hand-rolled loop, used for non-ERMS CPUs.
This patch keeps ERMS use for forward ordered memory copies, but
removes it for backward overlapped moves where it does not work.
Reviewed by: mjg
MFC after: 2 weeks
(cherry picked from commit 6210ac95a1)
PTI page table pages are allocated from a VM object, so must be
exclusively busied when they are freed, e.g., when a thread loses a race
in pmap_pti_pde(). Simply keep PTPs busy at all times, as was done for
some other kernel allocators in commit
e9ceb9dd11.
Also remove some redundant assertions on "ref_count":
vm_page_unwire_noq() already asserts that the page's reference count is
greater than zero.
Reported by: syzkaller
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit c6d092b510)
It is unused, especially now that the underlying d_dumper methods do not
accept the argument.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35174
(cherry picked from commit db71383b88)
Statistic for "number of vm exits handled in userspace" should be
increased in vm_run() instead of vmx_run() because in some cases
vm_run() doesn't exit to userspace and keeps entering the guest.
Also svm_run's implementation even wrongly misses that stat.
Reviewed by: markj
(cherry picked from commit e7d34aeda4)
x86 is cache coherent. However, there are special cases where cache
coherency isn't ensured (e.g. when switching the caching mode). In these
cases, WBINVD can be used. WBINVD writes all cache lines back into main
memory and invalidates the whole cache.
Due to the invalidation of the whole cache, WBINVD is a very heavy
instruction and degrades the performance on all cores. So, we should
minimize the use of WBINVD as much as possible.
In a virtual environment, the WBINVD call is mostly useless. The guest
isn't able to break cache coherency because he can't switch the physical
cache mode. When using pci passthrough WBINVD might be useful.
Nevertheless, trapping and ignoring WBINVD is an unsafe operation. For
that reason, we implement it as tunable.
Reviewed by: jhb
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35253
(cherry picked from commit 3ba952e1a2)
Replace sigframe sf_extramask by native sigset_t and use it to
store/restore the thread signal mask without conversion to/from
Linux signal mask.
Pointy hat to: dchagin
MFC after: 2 weeks
(cherry picked from commit 4a6c2d075d)
On amd64 Linux saves the thread signal mask in both contexts, in the machine
dependent and in the machine independent. Both contexts are user accessible.
Convert the mask once, then copy it.
MFC after: 2 weeks
(cherry picked from commit c30a767c6f)
Use kern_sigprocmask() instead of direct manipulation of td_sigmask
to reschedule newly blocked signals.
MFC after: 2 weeks
(cherry picked from commit 2ab9b59faa)
Move sigprocmask actions defines under compat/linux,
they are identical across all Linux architectures.
MFC after: 2 weeks
(cherry picked from commit 2ca34847e7)
To solve y2k38 problem in the recvmsg syscall the new SO_TIMESTAMP
constant were added on v5.1 Linux kernel. So, old 32-bit binaries
that knows only 32-bit time_t uses the old value of the constant,
and binaries that knows 64-bit time_t uses the new constant.
To determine what size of time_t type is expected by the user-space,
store requested value (SO_TIMESTAMP) in the process emuldata structure.
MFC after: 2 weeks
(cherry picked from commit 0e26e54bdf)
As linux_execve is common across archs, except amd64 32-bit Linuxulator,
move it under compat/linux.
Noted by: andrew@
MFC after: 2 weeks
(cherry picked from commit 26700ac0c4)
The Linux exports __kernel_sigreturn and __kernel_rt_sigreturn from the
vdso. Modern glibc's sigaction sets the sa_restorer field of sigaction
to the corresponding vdso __sigreturn, and sets the SA_RESTORER.
Our signal trampolines uses the FreeBSD-way to call a signal handler,
so does not use the sigaction's sa_restorer.
However, as glibc's runtime linker depends on the existment of the vdso
__sigreturn symbols, for all Linuxulators was added separate trampolines
named __sigcode with DWARF anotations and left separate __sigreturn
methods, which are exported.
MFC after: 2 weeks
(cherry picked from commit 8f9635dc99)
To reduce sendsig code difference and to avoid confusing me,
rename sf_sc to sf_uc to match the content.
MFC after: 2 weeks
(cherry picked from commit 6e826d27c3)
Rework the defintion of struct siginfo so that the array padding
struct siginfo to SI_MAX_SIZE can be placed in a union along side of the
rest of the struct siginfo members. The result is that we no longer need
the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions.
Move struct siginfo definition under /compat/linux to reduce MD part.
To avoid headers polution include linux_siginfo.h in the MD linux.h
MFC after: 2 weeks
(cherry picked from commit af557e649c)
The signal trampoine-related definitions are used only in the MD part
of code, wherefore moved from everywhere used linux.h to separate MD
headers.
MFC after: 2 weeks
(cherry picked from commit 21f2461741)
This is the first stage of a signal trampolines refactoring.
From trampolines retired emulation of the 'call' instruction, which is
replaced by direct call of a signal handler. The signal handler address
is in the register.
The previous trampoline implemenatation used semi-Linux-way to call
a signal handler via the 'jmp' instruction. Wherefore the trampoline
emulated a 'call' instruction to into the stack the return address for
signal handler's 'ret' instruction. Wherefore handmade DWARD annotations
was used.
While here rephrased and removed excessive comments.
MFC after: 2 weeks
(cherry picked from commit ba279bcd6d)
Both uc_flags and uc_link are zeroed above. On amd64 and i386 the
uc_link field is not used at all. The UC_FP_XSTATE bit should be set
in the uc_flags if OS xsave knob is turned on (and xsave is implemented).
MFC after: 2 weeks
(cherry picked from commit 0b5d5dc376)
On i386 are two semtimedop. The old one is called via multiplexor and
uses 32-bit timespec, and new semtimedop_tim64, which is uses 64-bit
timespec.
MFC after: 2 weeks
(cherry picked from commit 3245a2ecea)
In i386 Linux semop called via ipc() multiplexor, so use kern_semop
directly from multiplexor.
MFC after: 2 weeks
(cherry picked from commit f48a68874b)
As the Linux semop syscall is not defined in i386, and as it is equal
to the native semop syscall, call it directly.
Fix semop definition to match Linux actual one - nsops is size_t type.
MFC after: 2 weeks
(cherry picked from commit f686092664)