Commit graph

17594 commits

Author SHA1 Message Date
Mark Johnston
b21b022a81 Revert r364310.
Some of the resulting fallout in CAM does not appear straightforward to
fix, so simply revert the commit for now in the absence of a better
solution.

Discussed with:	mjg
Reported by:	dhw
2020-08-18 14:09:49 +00:00
Gleb Smirnoff
1921bb7b68 With INVARIANTS panic immediately if M_WAITOK is requested in a
non-sleepable context.  Previously only _sleep() would panic.
This will catch misuse of M_WAITOK at development stage rather
than at stress load stage.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D26027
2020-08-17 15:37:08 +00:00
Konstantin Belousov
beb27033aa Fix powerpc build.
Sponsored by:	The FreeBSD Foundation
2020-08-16 22:50:59 +00:00
Konstantin Belousov
fbca789fc3 VMIO read
If possible, i.e. if the requested range is resident valid in the vm
object queue, and some secondary conditions hold, copy data for
read(2) directly from the valid cached pages, avoiding vnode lock and
instantiating buffers.  I intentionally do not start read-ahead, nor
handle the advises on the cached range.

Filesystems indicate support for VMIO reads by setting VIRF_PGREAD
flag, which must not be cleared until vnode reclamation.

Currently only filesystems that use vnode pager for v_objects can
enable it, due to reliance on vnp_size.  There is a WIP to handle it
for tmpfs.

Reviewed by:	markj
Discussed with:	jeff
Tested by:	pho
Benchmarked by:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25968
2020-08-16 21:02:45 +00:00
Mateusz Guzik
6041408826 vfs: retire vrefl as a symbol
vrefl calls vref and there is only one in-tree consumer.

Keep it as a macro for assertion purposes.
2020-08-16 18:51:12 +00:00
Mateusz Guzik
5faf134cce vfs: assert that VI_TEXT_REF is not already set 2020-08-16 18:45:31 +00:00
Mateusz Guzik
3c5d2ed71f cache: add NOCAPCHECK to the list of supported flags for lockless lookup
It is de facto supported in that lockless lookup does not do any capability
checks.
2020-08-16 18:33:24 +00:00
Mateusz Guzik
8ab4becab0 vfs: use namei_zone for getcwd allocations
instead of malloc.

Note that this should probably be wrapped with a dedicated API and other
vn_getcwd callers did not get converted.
2020-08-16 18:21:21 +00:00
Mateusz Guzik
494c0f2a83 vfs: mark HASBUF as an internal flag
There is no setter for cn_pnbuf.
2020-08-16 17:55:20 +00:00
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Conrad Meyer
b2d52e5c43 witness(4): Print stack of prior observed lock order on reversal
The first time Witness observes a lock order between two locks, it records
the caller's stack.  On detected reversal, print out that previous observed
stack.  It is quite possible that the reported "LOR" is the correct
ordering, and the violation was the observed earlier ordering.

Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D26070
2020-08-15 19:45:50 +00:00
Jason A. Harmening
f3ba85ccc8 kenv: avoid sleepable alloc for integer tunables
Avoid performing a potentially-blocking malloc for kenv lookups that will only
perform non-destructive integer conversions on the returned buffer. Instead,
perform the strtoq() in-place with the kenv lock held.

While here, factor the logic around kenv_lock acquire and release into
kenv_acquire() and kenv_release(), and use these functions for some light
cleanup. Collapse getenv_string_buffer() into kern_getenv(), as the former
no longer has any other callers and the only additional task performed by
the latter is a WITNESS check that hasn't been useful since r362231.

PR:		248250
Reported by:	gbe
Reviewed by:	mjg
Tested by:	gbe
Differential Revision:	https://reviews.freebsd.org/D26010
2020-08-14 21:37:38 +00:00
Mark Johnston
85232c2ff1 Rename the pipe_map field of struct pipe.
This is to avoid conflicts with a upcoming macro.  pipe_pages is a
more accurate name since the field tracks pages wired into the kernel as
part of a process-to-process copy operation.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-08-14 14:50:41 +00:00
Conrad Meyer
8a0edc914f Add prng(9) API
Add prng(9) as a replacement for random(9) in the kernel.

There are two major differences from random(9) and random(3):

- General prng(9) APIs (prng32(9), etc) do not guarantee an
  implementation or particular sequence; they should not be used for
  repeatable simulations.

- However, specific named API families are also exposed (for now: PCG),
  and those are expected to be repeatable (when so-guaranteed by the named
  algorithm).

Some minor differences from random(3) and earlier random(9):

- PRNG state for the general prng(9) APIs is per-CPU; this eliminates
  contention on PRNG state in SMP workloads.  Each PCPU generator in an
  SMP system produces a unique sequence.

- Better statistical properties than the Park-Miller ("minstd") PRNG
  (longer period, uniform distribution in all bits, passes
  BigCrush/PractRand analysis).

- Faster than Park-Miller ("minstd") PRNG -- no division is required to
  step PCG-family PRNGs.

For now, random(9) becomes a thin shim around prng32().  Eventually I
would like to mechanically switch consumers over to the explicit API.

Reviewed by:	kib, markj (previous version both)
Discussed with:	markm
Differential Revision:	https://reviews.freebsd.org/D25916
2020-08-13 20:48:14 +00:00
Mateusz Guzik
b38ad2683a vfs: add missing pwd_drop on error in namei_setup
Reported by:	pho
2020-08-13 10:24:45 +00:00
Mateusz Guzik
36f47512d9 vfs: inline vrefcnt 2020-08-12 04:53:20 +00:00
Mateusz Guzik
4c2d103a02 vfs: garbage collect vrefactn 2020-08-12 04:53:02 +00:00
Mateusz Guzik
6883f07e97 vfs: reimplement vref on top of vget
No change in generated assembly.
2020-08-12 04:52:35 +00:00
Conrad Meyer
0ac9e27ba9 devfs: Abstract locking assertions
The conversion was largely mechanical: sed(1) with:

  -e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
  -e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'

The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.

No functional change.
2020-08-12 00:32:31 +00:00
Mateusz Guzik
3b44443626 devfs: rework si_usecount to track opens
This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612
2020-08-11 14:27:57 +00:00
Mateusz Guzik
2d0631dd08 vfs: stricter validation for flags passed to namei in cn_flags
namei de facto expects that the naimeidata object is properly initialized,
but at the same time it mixes consumer-passable and internal flags, while
tolerating this part by explicitly clearing some of them.

Tighten the interface instead.

While here renumber the flags and denote the gap between the 2 variants.

Try to piggy back th renumber on the just bumped __FreeBSD_version.
2020-08-11 01:34:40 +00:00
Mateusz Guzik
25e42ee217 vfs: drop the hello world stat probes from the vfs provider
Interested parties can get the same information by hoooking on vop_stat.
2020-08-10 18:11:00 +00:00
Mateusz Guzik
5e79447d60 cache: let SAVESTART passthrough
The flag is only passed for non-LOOKUP ops and those fallback to the slowpath.
2020-08-10 12:28:56 +00:00
Mateusz Guzik
bb48255cf5 cache: resize struct namecache to a multiply of alignment
For example struct namecache on amd64 is 100 bytes, but it has to occupies
104. Use the extra bytes to support longer names.
2020-08-10 12:05:55 +00:00
Mateusz Guzik
8b62cebea7 cache: remove unused variables from cache_fplookup_parse 2020-08-10 11:51:56 +00:00
Mateusz Guzik
03337743db vfs: clean MNTK_FPLOOKUP if MNT_UNION is set
Elides checking it during lookup.
2020-08-10 11:51:21 +00:00
Mateusz Guzik
c571b99545 cache: strlcpy -> memcpy 2020-08-10 10:40:14 +00:00
Mateusz Guzik
3ba0e51703 vfs: partially support file create/delete/rename in lockless lookup
Perform the lookup until the last 2 elements and fallback to slowpath.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:35:18 +00:00
Mateusz Guzik
21d5af2b30 vfs: drop the thread argumemnt from vfs_fplookup_vexec
It is guaranteed curthread.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:34:22 +00:00
Mateusz Guzik
7f70080150 vfs: disallow NOCACHE with LOOKUP
This means there is no expectation lookup will purge the terminal entry,
which simplifies lockless lookup.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:33:40 +00:00
Mateusz Guzik
51ea7bea91 vfs: add VOP_STAT
The current scheme of calling VOP_GETATTR adds avoidable overhead.

An example with tmpfs doing fstat (ops/s):
before: 7488958
after:  7913833

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D25910
2020-08-07 23:06:40 +00:00
Mateusz Guzik
1ff80a3400 vfs: release the interlock after failing to set VHOLD_NO_SMR
While here add more comments.

Diagnosed by:	markj
Reported by:	pho
Fixes:	r362827 ("vfs: protect vnodes with smr")
2020-08-07 19:36:08 +00:00
Warner Losh
f7bb4f88c5 Remove obsolete part of comment. It was cut and pasted from the old version of
this function, and was never relevant to the new version.
2020-08-07 18:21:48 +00:00
Hans Petter Selasky
826c079373 Add full support support for dynamic allocation and freeing of epoch's.
Make sure to reclaim epoch structures when they are freed to support
dynamic allocation and freeing of epoch structures.

While at it, move the 64 supported epoch control structures to the
static memory domain. This overall simplifies the management and
debugging of system epoch's.

Reviewed by:		kib, markj
Differential Revision:	https://reviews.freebsd.org/D25960
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-08-07 15:32:42 +00:00
Mark Johnston
0ffec1b03d Clean up reassignbuf() and buf_vlist_remove() a bit.
- Convert panic() calls to INVARIANTS-only assertions.  The PCTRIE code
  provides some of the same protection since it will panic upon an
  attempt to remove a non-resident buffer.
- Update the comment above reassignbuf() to reflect reality.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:43:15 +00:00
Mark Johnston
7013797e34 Remove the vfs.reassignbufcalls counter and sysctl.
As the 20-year old comment above it suggests, the counter is of dubious
value.  Moreover, the (global) counter was not updated precisely and
hurts scalability.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:42:59 +00:00
Mateusz Guzik
e910c93eea cache: add more predicts for failing conditions 2020-08-06 04:20:14 +00:00
Mateusz Guzik
95888901f7 cache: plug unititalized variable use
CID:	1431128
2020-08-06 04:19:47 +00:00
Mateusz Guzik
bb62c418fd vfs hash: annotate the lock with __exclusive_cache_line
Note the code does not scale in the current form.
2020-08-05 19:34:13 +00:00
Mateusz Guzik
4f00177887 pipe: reduce atime precision
The routine is called on successful write and read, which on pipes happens a
lot and for small sizes.

Precision provided by default seems way bigger than necessary and it causes
problems in vms on amd64 (it rdtscp's which vmexits). getnanotime seems to
provide the level roughly in lines of Linux so we should be good here.

Sample result from will-it-scale pipe1_processes -t 1 (ops/s):
before: 426464
after: 3247421

Note the that atime handling for named pipes is broken with and without the
patch. The filesystem code is never used for updating atime and never looks
at the updated field. Consequently, while there are no provisions added to
handle named pipes separately, the change is a nop for that case.

Differential Revision:	 https://reviews.freebsd.org/D23964
2020-08-05 19:15:59 +00:00
Andrey V. Elsukov
edde7a538b Add m__getjcl SDT probe.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2020-08-05 11:39:09 +00:00
Mateusz Guzik
e1b1971c05 cache: don't ignore size passed to nchinittbl 2020-08-05 09:38:02 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
2b86f9d6d0 cache: convert the hash from LIST to SLIST
This reduces struct namecache by sizeof(void *).

Negative side is that we have to find the previous element (if any) when
removing an entry, but since we normally don't expect collisions it should be
fine.

Note this adds cache_get_hash calls which can be eliminated.
2020-08-05 09:25:59 +00:00
Mateusz Guzik
cf8ac0de81 cache: reduce zone alignment to 8 bytes
It used to be sizeof of the given struct to accomodate for 32 bit mips
doing 64 bit loads, but the same can be achieved with requireing just
64 bit alignment.

While here reorder struct namecache so that most commonly used fields
are closer.
2020-08-05 09:24:38 +00:00
Mateusz Guzik
d61ce7ef50 cache: convert ncnegnash into a macro
It is a read-only var with value known at compilation time.
2020-08-05 09:24:00 +00:00
Mateusz Guzik
158ab70c24 vfs: tidy up namei entry point
- predict for string copy errors
- reshuffle inititalistion of vars which are not needed
2020-08-05 07:33:39 +00:00
Mateusz Guzik
2840f07d4f cache: cleanup lockless entry point
- remove spurious bzero
- assert ni_lcf, it has to be set by namei by this point
2020-08-05 07:32:26 +00:00
Mateusz Guzik
8ccf01e0e2 cache: stop messing with cn_lkflags
See r363882.
2020-08-05 07:30:57 +00:00
Mateusz Guzik
27c4618df5 cache: stop messing with cn_flags
This removes flag setting/unsetting carried over from regular lookup.
Flags still get for compatibility when falling back.

Note .. and . handling can get partially folded together.
2020-08-05 07:30:17 +00:00