Some of the resulting fallout in CAM does not appear straightforward to
fix, so simply revert the commit for now in the absence of a better
solution.
Discussed with: mjg
Reported by: dhw
non-sleepable context. Previously only _sleep() would panic.
This will catch misuse of M_WAITOK at development stage rather
than at stress load stage.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D26027
If possible, i.e. if the requested range is resident valid in the vm
object queue, and some secondary conditions hold, copy data for
read(2) directly from the valid cached pages, avoiding vnode lock and
instantiating buffers. I intentionally do not start read-ahead, nor
handle the advises on the cached range.
Filesystems indicate support for VMIO reads by setting VIRF_PGREAD
flag, which must not be cleared until vnode reclamation.
Currently only filesystems that use vnode pager for v_objects can
enable it, due to reliance on vnp_size. There is a WIP to handle it
for tmpfs.
Reviewed by: markj
Discussed with: jeff
Tested by: pho
Benchmarked by: mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25968
The first time Witness observes a lock order between two locks, it records
the caller's stack. On detected reversal, print out that previous observed
stack. It is quite possible that the reported "LOR" is the correct
ordering, and the violation was the observed earlier ordering.
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D26070
Avoid performing a potentially-blocking malloc for kenv lookups that will only
perform non-destructive integer conversions on the returned buffer. Instead,
perform the strtoq() in-place with the kenv lock held.
While here, factor the logic around kenv_lock acquire and release into
kenv_acquire() and kenv_release(), and use these functions for some light
cleanup. Collapse getenv_string_buffer() into kern_getenv(), as the former
no longer has any other callers and the only additional task performed by
the latter is a WITNESS check that hasn't been useful since r362231.
PR: 248250
Reported by: gbe
Reviewed by: mjg
Tested by: gbe
Differential Revision: https://reviews.freebsd.org/D26010
This is to avoid conflicts with a upcoming macro. pipe_pages is a
more accurate name since the field tracks pages wired into the kernel as
part of a process-to-process copy operation.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Add prng(9) as a replacement for random(9) in the kernel.
There are two major differences from random(9) and random(3):
- General prng(9) APIs (prng32(9), etc) do not guarantee an
implementation or particular sequence; they should not be used for
repeatable simulations.
- However, specific named API families are also exposed (for now: PCG),
and those are expected to be repeatable (when so-guaranteed by the named
algorithm).
Some minor differences from random(3) and earlier random(9):
- PRNG state for the general prng(9) APIs is per-CPU; this eliminates
contention on PRNG state in SMP workloads. Each PCPU generator in an
SMP system produces a unique sequence.
- Better statistical properties than the Park-Miller ("minstd") PRNG
(longer period, uniform distribution in all bits, passes
BigCrush/PractRand analysis).
- Faster than Park-Miller ("minstd") PRNG -- no division is required to
step PCG-family PRNGs.
For now, random(9) becomes a thin shim around prng32(). Eventually I
would like to mechanically switch consumers over to the explicit API.
Reviewed by: kib, markj (previous version both)
Discussed with: markm
Differential Revision: https://reviews.freebsd.org/D25916
The conversion was largely mechanical: sed(1) with:
-e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
-e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'
The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.
No functional change.
This removes a lot of special casing from the VFS layer.
Reviewed by: kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D25612
namei de facto expects that the naimeidata object is properly initialized,
but at the same time it mixes consumer-passable and internal flags, while
tolerating this part by explicitly clearing some of them.
Tighten the interface instead.
While here renumber the flags and denote the gap between the 2 variants.
Try to piggy back th renumber on the just bumped __FreeBSD_version.
This means there is no expectation lookup will purge the terminal entry,
which simplifies lockless lookup.
Tested by: pho
Sponsored by: The FreeBSD Foundation
The current scheme of calling VOP_GETATTR adds avoidable overhead.
An example with tmpfs doing fstat (ops/s):
before: 7488958
after: 7913833
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D25910
Make sure to reclaim epoch structures when they are freed to support
dynamic allocation and freeing of epoch structures.
While at it, move the 64 supported epoch control structures to the
static memory domain. This overall simplifies the management and
debugging of system epoch's.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D25960
MFC after: 1 week
Sponsored by: Mellanox Technologies
- Convert panic() calls to INVARIANTS-only assertions. The PCTRIE code
provides some of the same protection since it will panic upon an
attempt to remove a non-resident buffer.
- Update the comment above reassignbuf() to reflect reality.
Reviewed by: cem, kib, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25965
As the 20-year old comment above it suggests, the counter is of dubious
value. Moreover, the (global) counter was not updated precisely and
hurts scalability.
Reviewed by: cem, kib, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25965
The routine is called on successful write and read, which on pipes happens a
lot and for small sizes.
Precision provided by default seems way bigger than necessary and it causes
problems in vms on amd64 (it rdtscp's which vmexits). getnanotime seems to
provide the level roughly in lines of Linux so we should be good here.
Sample result from will-it-scale pipe1_processes -t 1 (ops/s):
before: 426464
after: 3247421
Note the that atime handling for named pipes is broken with and without the
patch. The filesystem code is never used for updating atime and never looks
at the updated field. Consequently, while there are no provisions added to
handle named pipes separately, the change is a nop for that case.
Differential Revision: https://reviews.freebsd.org/D23964
This reduces struct namecache by sizeof(void *).
Negative side is that we have to find the previous element (if any) when
removing an entry, but since we normally don't expect collisions it should be
fine.
Note this adds cache_get_hash calls which can be eliminated.
It used to be sizeof of the given struct to accomodate for 32 bit mips
doing 64 bit loads, but the same can be achieved with requireing just
64 bit alignment.
While here reorder struct namecache so that most commonly used fields
are closer.
This removes flag setting/unsetting carried over from regular lookup.
Flags still get for compatibility when falling back.
Note .. and . handling can get partially folded together.