opnsense-src/sys/amd64
Don Lewis cd155b5603 Lower the amd64 shared page, which contains the signal trampoline,
from the top of user memory to one page lower on machines with the
Ryzen (AMD Family 17h) CPU.  This pushes ps_strings and the stack
down by one page as well.  On Ryzen there is some sort of interaction
between code running at the top of user memory address space and
interrupts that can cause FreeBSD to either hang or silently reset.
This sounds similar to the problem found with DragonFly BSD that
was fixed with this commit:
  https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20
but our signal trampoline location was already lower than the address
that DragonFly moved their signal trampoline to.  It also does not
appear to be related to SMT as described here:
  https://www.phoronix.com/forums/forum/hardware/processors-memory/955368-some-ryzen-linux-users-are-facing-issues-with-heavy-compilation-loads?p=955498#post955498

  "Hi, Matt Dillon here. Yes, I did find what I believe to be a
   hardware issue with Ryzen related to concurrent operations. In a
   nutshell, for any given hyperthread pair, if one hyperthread is
   in a cpu-bound loop of any kind (can be in user mode), and the
   other hyperthread is returning from an interrupt via IRETQ, the
   hyperthread issuing the IRETQ can stall indefinitely until the
   other hyperthread with the cpu-bound loop pauses (aka HLT until
   next interrupt). After this situation occurs, the system appears
   to destabilize. The situation does not occur if the cpu-bound
   loop is on a different core than the core doing the IRETQ. The
   %rip the IRETQ returns to (e.g. userland %rip address) matters a
   *LOT*. The problem occurs more often with high %rip addresses
   such as near the top of the user stack, which is where DragonFly's
   signal trampoline traditionally resides. So a user program taking
   a signal on one thread while another thread is cpu-bound can cause
   this behavior. Changing the location of the signal trampoline
   makes it more difficult to reproduce the problem. I have not
   been because the able to completely mitigate it. When a cpu-thread
   stalls in this manner it appears to stall INSIDE the microcode
   for IRETQ. It doesn't make it to the return pc, and the cpu thread
   cannot take any IPIs or other hardware interrupts while in this
   state."
since the system instability has been observed on FreeBSD with SMT
disabled.  Interrupts to appear to play a factor since running a
signal-intensive process on the first CPU core, which handles most
of the interrupts on my machine, is far more likely to trigger the
problem than running such a process on any other core.

Also lower sv_maxuser to prevent a malicious user from using mmap()
to load and execute code in the top page of user memory that was made
available when the shared page was moved down.

Make the same changes to the 64-bit Linux emulator.

PR:		219399
Reported by:	nbe@renzel.net
Reviewed by:	kib
Reviewed by:	dchagin (previous version)
Tested by:	nbe@renzel.net (earlier version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11780
2017-08-02 01:43:35 +00:00
..
acpica Ensure that resume path on amd64 only accesses page tables for normal 2017-05-15 20:52:43 +00:00
amd64 Lower the amd64 shared page, which contains the signal trampoline, 2017-08-02 01:43:35 +00:00
cloudabi32 Move struct syscall_args syscall arguments parameters container into 2017-06-12 21:03:23 +00:00
cloudabi64 Move struct syscall_args syscall arguments parameters container into 2017-06-12 21:03:23 +00:00
conf An MMC/SD/SDIO stack using CAM 2017-07-09 16:57:24 +00:00
ia32 Translate between abridged and full x87 tags for compat32 2017-06-24 11:38:31 +00:00
include Lower the amd64 shared page, which contains the signal trampoline, 2017-08-02 01:43:35 +00:00
linux Lower the amd64 shared page, which contains the signal trampoline, 2017-08-02 01:43:35 +00:00
linux32 Avoid using [LINUX_]SHAREDPAGE constant directly in the vdso code. 2017-07-30 21:24:20 +00:00
pci pcicfg: Fix direct calls of pci_cfg{read,write} on systems w/o PCI host bridge. 2017-05-04 05:28:46 +00:00
vmm amd-vi: gcc build errors 2017-07-07 06:37:19 +00:00
Makefile Bring the tags and links entries for amd64 up to date. 2015-10-27 22:59:24 +00:00