Commit graph

135677 commits

Author SHA1 Message Date
Hans Petter Selasky
6e5baec33c Fix for use-after-free in if_ure(4) driver.
When detaching the if_ure(4) driver, the TX active USB transfer array may
point to freed USB transfers. Given that the number of USB transfers is
very low, simply start all transfers every time there is a packet to
keep safe from use-after-free.

PR: 252608
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-12 17:57:58 +01:00
mhorne
0628f68357 riscv pmap: add some pv list assertions
Ensure that we don't end up with a superpage in the vm_page_t's pv list.

This may help with debugging the panic reported in PR 250866, in which
l3 in pmap_remove_write() was found to be NULL. Adding a KASSERT to this
function will help narrow down the cause of this panic the next time it
occurs.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D28109
2021-01-12 11:12:02 -04:00
Mateusz Guzik
70ba77706d vfs: extend vfs:namei:lookup:return probe with nameidata 2021-01-12 13:35:27 +00:00
Mateusz Guzik
cdb62ab74e vfs: add NDFREE_NOTHING and convert several NDFREE_PNBUF callers
Check the comment above the routine for reasoning.
2021-01-12 13:16:10 +00:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Andrew Turner
c00ec4dab2 Handle using a sub instruction in the arm64 fbt
Some stack frames are too large for a store pair instruction we already
detect in the arm64 fbt code. Add support for handling subtracting the
stack pointer directly.

Sponsored by:	Innovate UK
2021-01-12 12:42:23 +00:00
Andrew Turner
d0df1a2d54 Only allow a store through sp in the arm64 fbt
When searching for an instruction to patch out in the arm64 function
boundary trace we search for a store pair with a write back. This
instruction is commonly used to store two registers to the stack
and update the stack pointer to hold space for more.

This works in many cases, however not all functions use this, e.g.
when the stack frame is too large. In these cases we may find another
instruction of the same type that doesn't store through the stack
pointer. Filter these instructions out and assume if we see one we
are past the function prologue.

Reported by:	rwatson
Sponsored by:	Innovate UK
2021-01-12 12:42:23 +00:00
Emmanuel Vadot
35a39dc5b3 Bump __FreeBSD_version after linuxkpi changes 2021-01-12 12:31:00 +01:00
Emmanuel Vadot
11d62b6f31 linuxkpi: add kernel_fpu_begin/kernel_fpu_end
With newer AMD GPUs (>=Navi,Renoir) there is FPU context usage in the
amdgpu driver.
The `kernel_fpu_begin/end` implementations in drm did not even allow nested
begin-end blocks.

Submitted by: Greg V
Reviewed By: manu, hselasky
Differential Revision: https://reviews.freebsd.org/D28061
2021-01-12 12:31:00 +01:00
Emmanuel Vadot
2c95fb753f linuxkpi: Add shrinker support
A driver can register a shrinker that will be called when the kernel
wants to free some memory.
Add support for that in linuxkpi and call the registered shrinkers
when the lowmem event is triggered.

Reviewed by:	bz
Differential Revision:	 https://reviews.freebsd.org/D27728
2021-01-12 12:31:00 +01:00
Emmanuel Vadot
105a37cac7 linuxkpi: Add more pci functions needed by DRM
-pci_get_class : This function search for a matching pci device based on
   the class/subclass and returns a newly created pci_dev.
 - pci_{save,restore}_state : This is analogous to ours with the same name
 - pci_is_root_bus : Return true if this is the root bus
 - pci_get_domain_bus_and_slot : This function search for a matching pci
   device based on domain, bus and slot/function concat into a single
   unsigned int (devfn) and returns a newly created pci_dev
 - pci_bus_{read,write}_config* : Read/Write to the config space.

While here add some helper function to alloc and fill the pci_dev struct.

Reviewed by:   hselasky, bz (older version)
Differential Revision:	   https://reviews.freebsd.org/D27550
2021-01-12 12:31:00 +01:00
Emmanuel Vadot
8517a547a0 pci: Add pci_find_class_from
pci_find_class_from help finding one or multiple device matching
a class and subclass.
If the from argument is not null we will first loop in the device list
until we find the matching device and only then start to check if the
class/subclass matches.

Reviewed by:   jhb
Differential Revision:	https://reviews.freebsd.org/D27549
2021-01-12 12:25:28 +01:00
Konstantin Belousov
57f22c828e sigfastblock: do not skip cursig/postsig loop in ast()
Even if sigfastblock block is non-zero, non-blockable signals must be
checked on ast and delivered now.  This also affects debugger ability
to attach, because issignal() also calls ptracestop() if there is
a pending stop for debugee.

Instead of checking for sigfastblock, and either setting PENDING flag
for usermode or doing signal delivery loop, always do the loop after
checking, and then handle PENDING bit. issignal() already does the right
thing for fast-blocked case, allowing only STOPs and SIGKILL delivery to
happen.

Reported by:	Vasily Postnicov <shamaz.mazum@gmail.com>, markj
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28089
2021-01-12 12:45:26 +02:00
Konstantin Belousov
513320c0f1 sigfastblock_setpend(): do not set PEND user flag unless TDP_SIGFASTPENDING is set.
User pending bit should not be set if kernel did not noted a pending signal.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28089
2021-01-12 12:43:34 +02:00
Kristof Provost
f743976583 dtrace: Blacklist riscv exception handlers for fbt
We can't safely instrument those exception handlers, so blacklist them.

Test case: dtrace -n :::

Reviewed by:		markj (previous version)
Differential Revision:	https://reviews.freebsd.org/D27754
2021-01-12 10:33:16 +01:00
Mateusz Guzik
44121a0fbe amd64: fix tlb shootdown when all cpus are passed in the bitmap
Right now the routine leaves the current CPU in the map, later tripping
on an assert when filling in the scoreboard: panic: IPI scoreboard is
zero, initiator 1 target 1

Instead pre-check if all CPUs are present in the map and remember that
outcome for later.

Fixes:	7eaea04a5b ("amd64: compare TLB shootdown target to all_cpus")
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28111
2021-01-12 08:47:32 +00:00
Konstantin Belousov
9402bb44f1 vmspace_fork: preserve wx settings in the child vm map after fork
Noted by:	markj
Sponsored by:	The FreeBSD Foundation
2021-01-12 08:09:59 +02:00
Alan Somers
ff1a307801 lio_listio: validate aio_lio_opcode
Previously, we would accept any kind of LIO_* opcode, including ones
that were intended for in-kernel use only like LIO_SYNC (which is not
defined in userland).  The situation became more serious with
022ca2fc7f.  After that revision, setting
aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion.

Note that POSIX does not specify what should happen if aio_lio_opcode is
invalid.

MFC-with:	022ca2fc7f
Reviewed by:	jhb, tmunro, 0mp
Differential Revision:	<https://reviews.freebsd.org/D28078
2021-01-11 19:53:01 -07:00
Andrew Gallatin
7eaea04a5b amd64: compare TLB shootdown target to all_cpus
On amd64, the pmap code passes all_cpus to
smp_targeted_tlb_shootdown() when unmapping from the
kernel pmap.  This function has an optimized path to send IPIs
to all but itself, which it intends to do when the target
is all cpus.   However, we need to compare the target cpu mask
with all_cpus, rather than using CPU_ISFULLSET().  Comparing with
CPU_ISFULLSET() will only work when we have MAXCPU cpus active in
the system, otherwise, we'll be sending repeated IPIs, rather than
a single IPI to all CPUs but ourself.

Fixing this should reduce the time spent in native_lapic_ipi_wait()
as we will be sending ipis in parallel, rather than one-by-one.
This is confirmed by dtrace.

Reviewed by: alc, jhb, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D28102
2021-01-11 20:09:32 -05:00
Kirk McKusick
2d4422e799 Eliminate lock order reversal in UFS ffs_unmount().
UFS uses a new "mntfs" pseudo file system which provides private
device vnodes for a file system to safely access its disk device.
The original device vnode is saved in um_odevvp to hold the exclusive
lock on the device so that any attempts to open it for writing will
fail. But it is otherwise unused and has its BO_NOBUFS flag set to
enforce that file systems using mntfs vnodes do not accidentally
use the original devfs vnode. When the file system is unmounted,
um_odevvp is no longer needed and is released.

The lock order reversal happens because device vnodes must be locked
before UFS vnodes. During unmount, the root directory vnode lock
is held. When when calling vrele() on um_odevvp, vrele() attempts to
exclusive lock um_odevvp causing the lock order reversal. The problem
is eliminated by doing a non-blocking exclusive lock on um_odevvp
which will always succeed since there are no users of um_odevvp.
With um_odevvp locked, it can be released using vput which does not
attempt to do a blocking exclusive lock request and thus avoids the
lock order reversal.

Sponsored by: Netflix
2021-01-11 16:49:07 -08:00
Alan Somers
58a08f9e99 [skip ci] Delete an accidentally-committed comment
MFC-With:	19cca0b961
2021-01-11 17:01:22 -07:00
Jason A. Harmening
e8a5a1ad71 rctl(4): support throttling resource usage to 0
For rate-based resources that support throttling (e.g.
readiops/writeips), this fixes a divide-by-zero panic when rctl(8)
passes 0 as the throttle value.  For these resources, treat
zero-throttle requests as requests to suspend forward progress as long
as possible using the duration specified in
kern.racct.rctl.throttle_max.

PR:		251803
Reported by:	chris@cretaforce.gr
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27858
2021-01-11 15:36:57 -08:00
Alexander V. Chernikov
2defbe9f0e Use rn_match instead of doing indirect calls in fib_algo.
Relevant inet/inet6 code has the control over deciding what
 the RIB lookup function currently is. With that in mind,
 explicitly set it to the current value (rn_match) in the
 datapath lookups. This avoids cost on indirect call.

Differential Revision: https://reviews.freebsd.org/D28066
2021-01-11 23:30:35 +00:00
Konstantin Belousov
4ea65707d3 exec_new_vmspace: print useful error message on ctty if stack cannot be mapped.
After old vmspace is destroyed during execve(2), but before the new space
is fully constructed, an error during image activation cannot be returned
because there is no executing program to receive it.

In the relatively common case of failure to map stack, print some hints
on the control terminal.  Note that user has enough knobs to cause stack
mapping error, and this is the most common reason for execve(2) aborting
the process.

Requested by:	jhb
Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28050
2021-01-12 01:15:43 +02:00
Konstantin Belousov
2e1c94aa1f Implement enforcing write XOR execute mapping policy.
It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE |
PROT_EXEC are never specified together, if vm_map has MAP_WX flag set.
FreeBSD control flag allows specific binary to request WX exempt, and
there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/
disable globally.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28050
2021-01-12 01:15:43 +02:00
Kristof Provost
86b653ed7e pf: quiet debugging printfs
Only log these when debugging output is enabled.
2021-01-11 22:30:44 +01:00
John Baldwin
c3e77ab43f arm64: Add armv8crpyto and ossl to NOTES.
Reviewed by:	mhorne
Differential Revision:	https://reviews.freebsd.org/D28099
2021-01-11 14:28:46 -08:00
John Baldwin
36a2f5b817 arm64: Don't disable options GDB in LINT.
Reviewed by:	mhorne
Differential Revision:	https://reviews.freebsd.org/D28098
2021-01-11 14:28:46 -08:00
Vincenzo Maffione
3005e10ddb netmap: vtnet: fix RX initialization after netmap_reset()
At device reset, we must not publish those netmap receive buffers
that are owned by userspace (nm_kr_rxspace).

MFC after:	1 week
2021-01-11 21:38:32 +00:00
Konstantin Belousov
4174e45fb4 amd64 pmap: do not sleep in pmap_allocpte_alloc() with zero referenced page table page.
Otherwise parallel pmap_allocpte_alloc() for nearby va might also fail
allocating page table page and free the page under us.  The end result is
that we could dereference unmapped pte when doing cleanup after sleep.

Instead, on allocation failure, first free everything, only then we can
drop pmap mutex and sleep safely, right before returning to caller.
Split inner non-sleepable part of the pmap_allocpte_alloc() into a new
helper pmap_allocpte_nosleep().

Reviewed by:	markj
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27956
2021-01-11 22:57:58 +02:00
Konstantin Belousov
9a8f5f5cf5 amd64 pmap: rename _pmap_allocpte() to pmap_allocpte_alloc().
The function performs actual allocation of pte, as opposed to
pmap_allocpte() that uses existing free pte if pt page is already
there. This also moves function out of namespace similar to a language
reserved.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27956
2021-01-11 22:57:58 +02:00
Konstantin Belousov
f67064e592 amd64 pmap: Remove wrong __unused annotation from the va argument.
Noted by:	alc
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27956
2021-01-11 22:57:58 +02:00
Konstantin Belousov
9658d9c71a amd64 pmap: fix NULL deref in pmap_mincore().
pmap_pdpe() might return NULL, check for it.

Reviewed by:	markj
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27956
2021-01-11 22:57:52 +02:00
Alexander V. Chernikov
0da3f8c98d Bump amount of queued packets in for unresolved ARP/NDP entries to 16.
Currently default behaviour is to keep only 1 packet per unresolved entry.
Ability to queue more than one packet was added 10 years ago, in r215207,
 though the default value was kep intact.

Things have changed since that time. Systems tend to initiate multiple
 connections at once for a variety of reasons.
For example, recent kern/252278 bug report describe happy-eyeball DNS
 behaviour sending multiple requests to the DNS server.

The primary driver for upper value for the queue length determination is
 memory consumption. Remote actors should not be able to easily exhaust
 local memory by sending packets to unresolved arp/ND entries.

For now, bump value to 16 packets, to match Darwin implementation.

The proper approach would be to switch the limit to calculate memory
 consumption instead of packet count and limit based on memory.

We should MFC this with a variation of D22447.

Reviewers: #manpages, #network, bz, emaste

Reviewed By: emaste, gbe(doc), jilles(doc)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D28068
2021-01-11 19:51:11 +00:00
Brooks Davis
d7a7d6a7c3 ndis: Per user request, delay removal to 14
We will remove ndis shortly after the 13 branch.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D28049
2021-01-11 18:11:49 +00:00
Roger Pau Monné
d1eb05aa0c xen: remove .swp file from public headers
Should have never been there in the first place.
2021-01-11 18:14:11 +01:00
Mitchell Horne
7dcddacafa cgem: update 64-bit check
The cgem(4) driver was updated to support 64-bit bus addressing in
facdd1cd20. However, the committed version determines this in an
un-idiomatic way. Change the compile-time conditional to check
BUS_SPACE_MAXADDR, rather than comparing int and pointer sizes.

Reported by:	jrtc27
2021-01-11 12:15:32 -04:00
Robert Watson
30b68ecda8 Changes that improve DTrace FBT reliability on freebsd/arm64:
- Implement a dtrace_getnanouptime(), matching the existing
  dtrace_getnanotime(), to avoid DTrace calling out to a potentially
  instrumentable function.

  (These should probably both be under KDTRACE_HOOKS.  Also, it's not clear
  to me that they are correct implementations for the DTrace thread time
  functions they are used in .. fixes for another commit.)

- Don't allow FBT to instrument functions involved in EL1 exception handling
  that are involved in FBT trap processing: handle_el1h_sync() and
  do_el1h_sync().

- Don't allow FBT to instrument DDB and KDB functions, as that makes it
  rather harder to debug FBT problems.

Prior to these changes, use of FBT on FreeBSD/arm64 rapidly led to kernel
panics due to recursion in DTrace.

Reliable FBT on FreeBSD/arm64 is reliant on another change from @andrew to
have the aarch64 instrumentor more carefully check that instructions it
replaces are against the stack pointer, which can otherwise lead to memory
corruption.  That change remains under review.

MFC after:	2 weeks
Reviewed by:	andrew, kp, markj (earlier version), jrtc27 (earlier version)
Differential revision:	https://reviews.freebsd.org/D27766
2021-01-11 15:42:22 +00:00
Roger Pau Monne
a765078790 xen/privcmd: implement the restrict ioctl
Use an interface compatible with the Linux one so that the user-space
libraries already using the Linux interface can be used without much
modifications.

This allows an open privcmd instance to limit against which domains it
can act upon.

Sponsored by:	Citrix Systems R&D
2021-01-11 16:33:27 +01:00
Roger Pau Monne
ed78016d00 xen/privcmd: implement the dm op ioctl
Use an interface compatible with the Linux one so that the user-space
libraries already using the Linux interface can be used without much
modifications.

This allows user-space to make use of the dm_op family of hypercalls,
which are used by device models.

Sponsored by:	Citrix Systems R&D
2021-01-11 16:33:27 +01:00
Roger Pau Monne
658860e2d0 xen/privcmd: implement the map resource ioctl
The interface is mostly the same as the Linux ioctl, so that we don't
need to modify the user-space libraries that make use of it.

The ioctl is just a proxy for the XENMEM_acquire_resource hypercall.

Sponsored by:	Citrix Systems R&D
2021-01-11 16:15:00 +01:00
Roger Pau Monné
147e593921 xen/privcmd: split setup of virtual address range into helper
Preparatory change for further additions that will also make use of
the same code. No functional change.

Sponsored by:	Citrix Systems R&D
2021-01-11 16:14:59 +01:00
Roger Pau Monné
f713a5b37e xen/privcmd: make some integers unsigned
There's no reason for them to be signed. No functional change.

Sponsored by:	Citrix Systems R&D
2021-01-11 16:14:59 +01:00
Roger Pau Monné
5ed9deef6b xen: update interface headers
This is a verbatim copy of the public headers from Xen 4.14.1.

No functional change intended.

Sponsored by: Citrix Systems R&D
2021-01-11 16:14:59 +01:00
Ryan Libby
16079c7233 hid: quiet -Wswitch
Gcc builds complained that not all switch cases are handled.  Add
default cases to appease gcc.

Reviewed by:	hselasky (previous version), wulf
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28082
2021-01-10 21:53:15 -08:00
Ryan Libby
c86fa3b8d7 pf: quiet -Wredundant-decls for pf_get_ruleset_number
In e86bddea9f sys/netpfil/pf/pf.h grew a
declaration of pf_get_ruleset_number.  Now delete the old declaration
from sys/net/pfvar.h.

Reviewed by:	kp
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28081
2021-01-10 21:53:15 -08:00
Alexander V. Chernikov
685de460bc Use static initializers for fib algo to shift initialization
to ealier stage. This allows to register modules loaded at
 boot time.

Reported by:	olivier
2021-01-11 00:16:54 +00:00
Vincenzo Maffione
55f0ad5fde netmap: restore hwofs and support it in iflib
Restore the hwofs functionality temporarily disabled by
7ba6ecf216 to prevent issues with iflib.
This patch brings the necessary changes to iflib to
enable howfs to allow interface restarts without
disrupting netmap applications actively using its
rings.
After this change, it becomes possible for multiple
non-cooperating netmap applications to use non-overlapping
subsets of the available netmap rings without clashing
with each other.

PR:		252453
MFC after:	1 week
2021-01-10 22:51:15 +00:00
Konstantin Belousov
a013e285df x86 tsc: mark %eax as earlyclobber in tscp_get_timecount_low().
i386 codegen insists on preloading tc_priv into register on i386, and
this register cannot be %eax because RDTSCP instruction clobbers it
before it is used.

Reported and tested by:	dim
MFC after:	6 days
Sponsored by:	The FreeBSD Foundation
2021-01-11 00:05:49 +02:00
Rick Macklem
148a227bf8 nfsd: add KASSERTs to nfsm_trimtrailing() for M_EXTPG mbufs
Add KASSERTS to nfsm_trimtrailing() to confirm the sanity of
the arguments for the M_EXTPG case.

Suggested by:	kib
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28053
2021-01-10 13:50:15 -08:00