Commit graph

4624 commits

Author SHA1 Message Date
Doug Moore
48d01e24f9 vm_addr_ok: add power2 invariant check
With INVARIANTS defined, have vm_addr_align_ok and vm_addr_bound_ok
panic when passed an alignment/boundary parameter that is not a power
of two.

Reviewed by:	alc
Suggested by:	kib, se
Differential Revision:	https://reviews.freebsd.org/D33725

(cherry picked from commit ae13829ddc)
2022-07-12 11:26:12 -05:00
Doug Moore
6f387a5632 vm_reserv: #include vm_extern.h explicitly, for arm.
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere
(cherry picked from commit f76916c095)
2022-07-12 11:26:06 -05:00
Doug Moore
bf27b9bc7f vm_phys: convert error back to warning
Move an assignment back to where it was before, to turn the
defined-but-not-used error back into a set-but-not-used warning.

Fixes:	01e115ab83 vm_phys: #include vm_extern
(cherry picked from commit e6930b1c5f)
2022-07-12 11:26:05 -05:00
Doug Moore
87e6f3d27e vm_phys: #include vm_extern
Arm64 and powerpc don't include vm_extern.h indirectly in vm_phys.c, which
means that for the sake of those architectures, it must be included explicitly.

Also, fix a set-unused warning that jenkins also found.

Reported by:	Jenkins
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere

(cherry picked from commit 01e115ab83)
2022-07-12 11:26:04 -05:00
Doug Moore
c5a5a9dbcf vm_extern: use standard address checkers everywhere
Define simple functions for alignment and boundary checks and use them
everywhere instead of having slightly different implementations
scattered about. Define them in vm_extern.h and use them where
possible where vm_extern.h is included.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D33685

(cherry picked from commit c606ab59e7)
2022-07-12 11:26:03 -05:00
Doug Moore
b1290f4746 vm_reserv: use enhanced bitstring for popmaps
vm_reserv.c uses its own bitstring implemenation for popmaps. Using
the bitstring_t type from a standard header eliminates the code
duplication, allows some bit-at-a-time operations to be replaced with
more efficient bitstring range operations, and, in
vm_reserv_test_contig, allows bit_ffc_area_at to more efficiently
search for a big-enough set of consecutive zero-bits.

Make bitstring changes improve the vm_reserv code.  Define a bit_ntest
method to test whether a range of bits is all set, or all clear.
Define bit_ff_at and bit_ff_area_at to implement the ffs and ffc
versions with a parameter to choose between set- and clear- bits.
Improve the area_at implementation.  Modify the bit_nset and
bit_nclear implementations to allow code optimization in the cases
when start or end are multiples of _BITSTR_BITS.

Add a few new cases to bitstring_test.

Discussed with:	alc
Reviewed by:	markj
Tested by:	pho (earlier version)
Differential Revision:	https://reviews.freebsd.org/D33312

(cherry picked from commit 84e2ae64c5)
2022-07-11 00:54:06 -05:00
Doug Moore
88a5d20e90 vm: alloc pages from reserv before breaking it
Function vm_reserv_reclaim_contig breaks a reservation with enough
free space to satisfy an allocation request and returns the free space
to the buddy allocator. Change the function to allocate the request
memory from the reservation before breaking it, and return that memory
to the caller. That avoids a second call to the buddy allocator and
guarantees successful allocation after breaking the reservation, where
that success is not currently guaranteed.

Reviewed by:	alc, kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D33644

(cherry picked from commit 0d5fac2872)
2022-07-11 00:41:07 -05:00
Doug Moore
b52d35520c Fix clerical error in page alloc
Fix a very recent change that introduced a page accounting error in
case of a reserveration being broken.
Reviewed by:	alc
Fixes:	fb38b29b56 (page_alloc_br) vm_page: Remove extra test, dup code from page alloc
Differential Revision:	https://reviews.freebsd.org/D33645

(cherry picked from commit 184c63db3c)
2022-07-11 00:41:06 -05:00
Doug Moore
a8d3ef5bc6 vm_page: Remove extra test from page alloc
Extract code from vm_page_alloc_contig_domain into a new function.  Do
so in a way that eliminates a bound-to-fail reservation test after a
reservation is broken by a call from vm_page_alloc_contig_domain.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D33551

(cherry picked from commit fb38b29b56)
2022-07-11 00:41:05 -05:00
Doug Moore
86299ec1c4 vm_phys: hide vm_phys_set_pool
It is only called in the file that defines it, so make it static and
remove the declaration from the header.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D33688

(cherry picked from commit 8119cdd38b)
2022-07-11 00:41:04 -05:00
Mark Johnston
de0b1239df vm: Fix racy checks for swap objects
Commit 4b8365d752 introduced the ability to dynamically register
VM object types, for use by tmpfs, which creates swap-backed objects.
As a part of this, checks for such objects changed from

  object->type == OBJT_DEFAULT || object->type == OBJT_SWAP

to

  object->type == OBJT_DEFAULT || (object->flags & OBJ_SWAP) != 0

In particular, objects of type OBJT_DEFAULT do not have OBJ_SWAP set;
the swap pager sets this flag when converting from OBJT_DEFAULT to
OBJT_SWAP.

A few of these checks are done without the object lock held.  It turns
out that this can result in false negatives since the swap pager
converts objects like so:

  object->type = OBJT_SWAP;
  object->flags |= OBJ_SWAP;

Fix the problem by adding explicit tests for OBJT_SWAP objects in
unlocked checks.

PR:		258932
Fixes:		4b8365d752 ("Add OBJT_SWAP_TMPFS pager")
Reported by:	bdrewery
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit e123264e4d)
2022-07-04 09:06:55 -04:00
Mark Johnston
cc81b8661d vm_fault: Fix a racy copy of page valid bits
We do not hold the object lock or a page busy lock when copying src_m's
validity state.  Prior to commit 45d72c7d7f we marked dst_m as fully
valid.

Use the source object's read lock to ensure that valid bits are not
concurrently cleared.

Reviewed by:	alc, kib
Fixes:		45d72c7d7f ("vm_fault_copy_entry: accept invalid source pages.")
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit d0443e2b98)
2022-06-29 10:12:34 -04:00
Mark Johnston
3fe539651a vm_fault: Avoid unnecessary object relocking in vm_fault_copy_entry()
Suggested by:	alc
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 1f88394b7f)
2022-06-29 10:12:34 -04:00
Mark Johnston
c75a5bc2f6 vm_object: Use the vm_object_(set|clear)_flag() helpers
... rather than setting and clearing flags inline.  No functional change
intended.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 630f633f2a)
2022-06-21 08:53:24 -04:00
Gordon Bergling
d44793af7e vm_page: Fix a typo in a source code comment
- s/consistancy/consistency/

(cherry picked from commit f77a88c855)
2022-06-10 14:30:17 +02:00
Gordon Bergling
59f3ff80f5 vm: Fix a common typo in a source code comment
- s/independant/independent/

(cherry picked from commit 860740ae0f)
2022-06-10 14:25:56 +02:00
John Baldwin
0c1258e707 Add a VA_IS_CLEANMAP() macro.
This macro returns true if a provided virtual address is contained
in the kernel's clean submap.

In CHERI kernels, the buffer cache and transient I/O map are allocated
as separate regions.  Abstracting this check reduces the diff relative
to FreeBSD.  It is perhaps slightly more readable as well.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D28710

(cherry picked from commit 67932460c7)
2022-05-10 10:47:07 -07:00
Mark Johnston
11eac05a40 vm: Move the "vm_wait in early boot" assertion to the proper place
The assertion was added in commit 1771e987ca.  After that, vm_wait()
and friends were refactored such that the actual sleep happens
elsewhere.  Now the assertion condition is not checked when
vm_wait_doms() is called directly, and it is checked even if we are not
going to sleep (because vm_page_count_min_set(wdoms) is false).

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 6fb7c42d59)
2022-04-21 09:18:14 -04:00
Mark Johnston
9a4e701578 vm: Initialize the transient buffer mapping arena with M_WAITOK
The wait flag is passed to UMA when allocating boundary tags for the
initial span, and UMA expects either M_WAITOK or M_NOWAIT to be present.

Reported by:	cperciva
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit f82177b8cf)
2022-04-21 09:18:04 -04:00
Mark Johnston
f9677b7e74 uma: Don't allow a limit to be set in a warm zone
The limit accounting in UMA does not tolerate this.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit d53927b0ba)
2022-04-13 08:10:35 -04:00
Gordon Bergling
6265d53f90 memguard(9): Fix two typos in source code comments
- s/comparsion/comparison/

(cherry picked from commit f167c46e79)
2022-04-09 08:08:00 +02:00
Mark Johnston
229eff21b7 uma: Use the correct type for a return value
zone_alloc_bucket() returns a pointer, not a bool.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 54361f9020)
2022-04-06 20:30:45 -04:00
Mark Johnston
13ba1d2836 vm_pageout: Print a more accurate message to the console before an OOM kill
Previously we'd always print "out of swap space."  This can be
misleading, as there are other reasons an OOM kill can be triggered.  In
particular, it's entirely possible to trigger an OOM kill on a system
with plenty of free swap space.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 4a864f624a)
2022-02-28 09:06:58 -05:00
Hans Petter Selasky
6bf4f9268f uma: Add UMA_ZONE_UNMANAGED
Allow a zone to opt out of cache size management.  In particular,
uma_reclaim() and uma_reclaim_domain() will not reclaim any memory from
the zone, nor will uma_timeout() purge cached items if the zone is idle.
This effectively means that the zone consumer has control over when
items are reclaimed from the cache.  In particular, uma_zone_reclaim()
will still reclaim cached items from an unmanaged zone.

Reviewed by:	hselasky, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34142

(cherry picked from commit 389a3fa693)
2022-02-24 10:59:28 +01:00
Edward Tomasz Napierala
b2db87294a Make vmdaemon timeout configurable
Make vmdaemon timeout configurable, so that one can adjust
how often it runs.

Here's a trick: set this to 1, then run 'limits -m 0 sh',
then run whatever you want with 'ktrace -it XXX', and observe
how the working set changes over time.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D22038

(cherry picked from commit 0f559a9f09)
2022-02-14 19:28:56 +00:00
John Baldwin
1a9f14cfa5 Use vmspace->vm_stacktop in place of sv_usrstack in more places.
Reviewed by:	markj
Obtained from:	CheriBSD

(cherry picked from commit becaf6433b)
2022-02-16 11:55:37 -05:00
Mark Johnston
a097a58543 fork: Copy the vm_stacktop field into the new vmspace
Fixes:	1811c1e957 ("exec: Reimplement stack address randomization")
Reported by:	pho
Reported by:	syzbot+0446312a51bc13ead834@syzkaller.appspotmail.com
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 46d35d415a)
2022-02-16 11:55:11 -05:00
Mark Johnston
5fa005e915 exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack.  This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings.  In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.

There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose.  Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.

The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.

PR:		260303
Reviewed by:	kib
Discussed with:	emaste, mw
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 1811c1e957)
2022-02-16 11:55:03 -05:00
Konstantin Belousov
3261dea72c Revert "vm_pageout_scans: correct detection of active object"
This reverts commit 3de96d664a.

PR:	261707

(cherry picked from commit b51927b7b0)
2022-02-10 16:56:15 +02:00
Konstantin Belousov
c383935d00 vmmeter(): Fix detection of the named swap objects
(cherry picked from commit 0b8643eaf6)
2022-02-09 02:42:44 +02:00
Konstantin Belousov
1617ca2c16 vm_object: restore handling of shadow_count for all type of objects
(cherry picked from commit 4cf9f5d807)
2022-02-09 02:42:44 +02:00
Rick Macklem
6d95a66f1b vm_object: Make is_object_active() global
(cherry picked from commit cd37afd8b6)
2022-02-09 02:42:44 +02:00
Konstantin Belousov
64e0f75f39 vm/vm_extern.h, vm/vm_page.h: use sys/kassert.h
(cherry picked from commit d950c5898a)
2022-02-08 08:42:07 +02:00
Konstantin Belousov
38a7a43505 vm/vm_pager.h: use sys/systm.h header
(cherry picked from commit f4cdb9d7c3)
2022-02-08 08:42:07 +02:00
Konstantin Belousov
78d27f25c7 Use dedicated lock name for pbufs
(cherry picked from commit 531f8cfea0)
2022-02-07 11:38:49 +02:00
Konstantin Belousov
2366d7ce87 vm_pageout_scans: correct detection of active object
(cherry picked from commit 3de96d664a)
2022-01-29 03:10:44 +02:00
Mark Johnston
0883a572d2 uma: Avoid polling for an invalid SMR sequence number
Buckets in an SMR-enabled zone can legitimately be tagged with
SMR_SEQ_INVALID.  This effectively means that the zone destructor (if
any) was invoked on all items in the bucket, and the contained memory is
safe to reuse.  If the first bucket in the full bucket list was tagged
this way, UMA would unnecessarily poll per-CPU state before attempting
to fetch a full bucket from the list.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit a04ce833f9)
2022-01-28 09:13:24 -05:00
Mark Johnston
3f85c51824 swap_pager: uma_zcreate() doesn't fail
Remove always-false checks for UMA zone creation failure.  No functional
change intended.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 43b3b8e52d)
2022-01-18 08:36:13 -05:00
Mark Johnston
d41768d5c1 vm_pageout: Group sysctl variables together with sysctl definitions
Fix some style bugs while here.  No functional change intended.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit c4a25e0713)
2022-01-18 08:36:04 -05:00
Dawid Gorecki
16a900ae02 setrlimit: Take stack gap into account.
Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.

Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.

PR: 253208
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31516

(cherry picked from commit 889b56c8cd)
2021-12-30 16:24:59 +01:00
Stephen J. Kiernan
bd7e18a378 Eliminate key press requirement "show vmopag" command output.
Summary:
One was required to press a key to continue after every 18 lines of
output. This requirement had been in the "show vmopag" command since it
was introduced, which was many years before paging was added to DDB.
With paging, this explict key check is no longer necessary.

Obtained from:	Juniper Networks, Inc.
MFC after:	1 week

Test Plan:
Run "show vmopag" from db> prompt and see that it does not need additional
keypresses other than the ones needed for the pager.

Subscribers: imp, #contributor_reviews_base

Differential Revision: https://reviews.freebsd.org/D33550

(cherry picked from commit 18048b6e3c)
2021-12-29 14:32:48 -05:00
Doug Moore
dd8ea1c755 vm_reserv: fix zero-boundary error
Handle specially the boundary==0 case of vm_reserv_reclaim_config,
by turning off boundary adjustment in that case.

Reviewed by:	alc
Tested by:	pho, madpilot

(cherry picked from commit 49fd2d51f0)
2021-12-29 11:23:48 -06:00
Mark Johnston
0fc6eebbf7 vm_fault: Fix vm_fault_populate()'s handling of VM_FAULT_WIRE
vm_map_wire() works by calling vm_fault(VM_FAULT_WIRE) on each page in
the rage.  (For largepage mappings, it calls vm_fault() once per large
page.)

A pager's populate method may return more than one page to be mapped.
If VM_FAULT_WIRE is also specified, we'd wire each page in the run, not
just the fault page.  Consider an object with two pages mapped in a
vm_map_entry, and suppose vm_map_wire() is called on the entry.  Then,
the first vm_fault() would allocate and wire both pages, and the second
would encounter a valid page upon lookup and wire it again in the
regular fault handler.  So the second page is wired twice and will be
leaked when the object is destroyed.

Fix the problem by modify vm_fault_populate() to wire only the fault
page.  Also modify the error handler for pmap_enter(psind=1) to not test
fs->wired, since it must be false.

PR:		260347
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 88642d978a)
2021-12-27 19:36:07 -05:00
Jason A. Harmening
fa4e4d55b3 Clean up a couple of MD warts in vm_fault_populate():
--Eliminate a big ifdef that encompassed all currently-supported
architectures except mips and powerpc32.  This applied to the case
in which we've allocated a superpage but the pager-populated range
is insufficient for a superpage mapping.  For platforms that don't
support superpages the check should be inexpensive as we shouldn't
get a superpage in the first place.  Make the normal-page fallback
logic identical for all platforms and provide a simple implementation
of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc.

--Apply the logic for handling pmap_enter() failure if a superpage
mapping can't be supported due to additional protection policy.
Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case,
and note Intel PKU on amd64 as the first example of such protection
policy.

Reviewed by:	kib, markj, bdragon

(cherry picked from commit 8dc8feb53d)
2021-12-27 19:35:55 -05:00
Doug Moore
42f18ad112 Correct type size format error in KASSERT.
Reported by:	jenkins
Fixes:	6f1c890827 vm: Don't break vm reserv that can't meet align reqs

(cherry picked from commit f7aa44763d)
2021-12-23 02:02:42 -06:00
Doug Moore
3b8062cdd5 vm: Don't break vm reserv that can't meet align reqs
Function vm_reserv_test_contig has incorrectly used its alignment
and boundary parameters to find a well-positioned range of empty pages
in a reservation.  Consequently, a reservation could be broken
mistakenly when it was unable to provide a satisfactory set of pages.

Rename the function, correct the errors, and add assertions to detect
the error in case it appears again.

Reviewed by:	alc, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33344

(cherry picked from commit 6f1c890827)
2021-12-23 02:01:17 -06:00
Konstantin Belousov
1791debf4a swapoff: add one more variant of the syscall
For MFC, COMPAT_FREEBSD13 braces were removed.

(cherry picked from commit 5346570276)
2021-12-20 02:29:11 +02:00
Konstantin Belousov
45786883b0 swapoff(2): add a SWAPOFF_FORCE flag
(cherry picked from commit e8dc2ba29c)
2021-12-20 02:29:11 +02:00
Konstantin Belousov
6ceede7d36 swapoff(2): replace special device name argument with a structure
(cherry picked from commit a4e4132fa3)
2021-12-20 02:29:11 +02:00
Doug Moore
0848451a2e Set uninitialized popmap bits in vm_reserv_init
In vm_reserv_init, set all the marker popmap bits in vm_reserv_init,
and not just the bits of the first popmap entry.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33258

(cherry picked from commit 9f32cb5b1c)
2021-12-13 23:09:13 -06:00