Commit graph

4569 commits

Author SHA1 Message Date
Konstantin Belousov
08d995ca8f swapoff_one(): only check free pages count manually turning swap off
(cherry picked from commit 0190c38b9d)
2021-12-06 02:29:43 +02:00
Mitchell Horne
233ec6b12b minidump: Use the provided dump bitset
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.

To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.

Reviewed by:	kib, markj, jhb
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31992

(cherry picked from commit 10fe6f80a6)
2021-12-03 10:02:03 -04:00
Konstantin Belousov
3a98b98be5 swap_pager: lock vnode in swapdev_strategy()
(cherry picked from commit b19740f4ce)
2021-12-02 04:21:15 +02:00
Konstantin Belousov
4b2caeec43 swapon: extend the region where the swap vnode is locked
(cherry picked from commit 6ddf41faa6)
2021-12-02 04:21:14 +02:00
Konstantin Belousov
81c9a051ea swap pager: lock vnode around VOP_CLOSE()
(cherry picked from commit a6d04f34a4)
2021-12-02 04:21:14 +02:00
Mark Johnston
1556ae1356 vm_page: Remove vm_page_sbusy() and vm_page_xbusy()
They are unused today and cannot be safely used in the face of unlocked
lookup, in which pages may be busied without the object lock held.

Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib

(cherry picked from commit a2665158d0)
2021-11-29 09:11:37 -05:00
Mark Johnston
cb081566cf vm_page: Consolidate page busy sleep mechanisms
- Modify vm_page_busy_sleep() and vm_page_busy_sleep_unlocked() to take
  a VM_ALLOC_* flag indicating whether to sleep on shared-busy, and fix
  up callers.
- Modify vm_page_busy_sleep() to return a status indicating whether the
  object lock was dropped, and fix up callers.
- Convert callers of vm_page_sleep_if_busy() to use vm_page_busy_sleep()
  instead.
- Remove vm_page_sleep_if_(x)busy().

No functional change intended.

Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib

(cherry picked from commit 87b646630c)
2021-11-29 09:11:29 -05:00
Mark Johnston
fdd27db348 vm: Add a mode to vm_object_page_remove() which skips invalid pages
This will be used to break a deadlock in ZFS between the per-mountpoint
teardown lock and page busy locks.  In particular, when purging data
from the page cache during dataset rollback, we want to avoid blocking
on the busy state of invalid pages since the busying thread may be
blocked on the teardown lock in zfs_getpages().

Add a helper, vn_pages_remove_valid(), for use by filesystems.  Bump
__FreeBSD_version so that the OpenZFS port can make use of the new
helper.

PR:		258208
Reviewed by:	avg, kib, sef
Tested by:	pho (part of a larger patch)
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit d28af1abf0)
2021-11-29 09:09:28 -05:00
Mark Johnston
0d900a16d0 vm_pager: Optimize an assertion
Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib

(cherry picked from commit b0acc3f11b)
2021-11-22 08:44:08 -05:00
Mark Johnston
ce9c3848ff uma: Fix handling of reserves in zone_import()
Kegs with no items reserved have uk_reserve = 0.  So the check
keg->uk_reserve >= dom->ud_free_items will be true once all slabs are
depleted.  Then, rather than go and allocate a fresh slab, we return to
the cache layer.

The intent was to do this only when the keg actually has a reserve, so
modify the check to verify this first.  Another approach would be to
make uk_reserve signed and set it to -1 until uma_zone_reserve() is
called, but this requires a few casts elsewhere.

Fixes:	1b2dcc8c54 ("uma: Avoid depleting keg reserves when filling a bucket")
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 7585c5db25)
2021-11-15 09:07:10 -05:00
Mark Johnston
d5ebaa6f8f uma: Improve M_USE_RESERVE handling in keg_fetch_slab()
M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded
recursion when the direct map is not available, as is the case on 32-bit
platforms or when certain kernel sanitizers (KASAN and KMSAN) are
enabled.  For example, to allocate KVA, the kernel might allocate a
kernel map entry, which might require a new slab, which requires KVA.

For these zones, we use uma_prealloc() to populate a reserve of items,
and then in certain serialized contexts M_USE_RESERVE can be used to
guarantee a successful allocation.  uma_prealloc() allocates the
requested number of items, distributing them evenly among NUMA domains.
Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we
might have to check the slab lists of other domains than the current one
to provide the semantics expected by consumers.

So, try harder to find an item if M_USE_RESERVE is specified and the keg
doesn't have anything for current (first-touch) domain.  Specifically,
fall back to a round-robin slab allocation.  This change fixes boot-time
panics on NUMA systems with KASAN or KMSAN enabled.[1]

Alternately we could have uma_prealloc() allocate the requested number
of items for each domain, but for some existing consumers this would be
quite wasteful.  In general I think keg_fetch_slab() should try harder
to find free slabs in other domains before trying to allocate fresh
ones, but let's limit this to M_USE_RESERVE for now.

Also fix a separate problem that I noticed: in a non-round-robin slab
allocation with M_WAITOK, rather than sleeping after a failed slab
allocation we simply try again.  Call vm_wait_domain() before retrying.

Reported by:	mjg, tuexen [1]
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit fab343a716)
2021-11-15 09:06:54 -05:00
Gordon Bergling
e3f2519c5c Fix a common typo in syctl descriptions
- s/maxiumum/maximum/

(cherry picked from commit c28e39c3d6)
2021-11-06 08:52:57 +01:00
Mark Johnston
5dc9004b72 vm_page: Break reservations to handle noobj allocations
vm_reserv_reclaim_*() will release pages to the default freepool, not
the direct freepool from which noobj allocations are drawn.  But if both
pools are empty, the noobj allocator variants must break reservations to
make progress.

Reported by:	cy
Reviewed by:	kib (previous version)
Fixes:	b498f71bc5 ("vm_page: Add a new page allocator interface for unnamed pages")
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit d7acbe481d)
2021-11-03 13:44:47 -04:00
Mark Johnston
f86bda068c Convert consumers to vm_page_alloc_noobj_contig()
Remove now-unneeded page zeroing.  No functional change intended.

Reviewed by:	alc, hselasky, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 84c3922243)
2021-11-03 13:41:40 -04:00
Mark Johnston
fb3ba080a1 Introduce vm_page_alloc_noobj_contig()
This is the same as vm_page_alloc_noobj(), but allocates physically
contiguous runs of memory.  For now it is implemented in terms of
vm_page_alloc_contig(), with the difference that
vm_page_alloc_noobj_contig() implements VM_ALLOC_ZERO by zeroing the
page.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 92db9f3bb7)
2021-11-03 13:41:00 -04:00
Mark Johnston
66cb1858f4 Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ.  In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by:	alc, hselasky, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit a4667e09e6)
2021-11-03 13:39:36 -04:00
Mark Johnston
24204bede3 vm_page: Add a new page allocator interface for unnamed pages
The diff adds vm_page_alloc_noobj() and vm_page_alloc_noobj_domain().
These mostly correspond to vm_page_alloc() and vm_page_alloc_domain()
when no VM object is specified, with the exception that they handle
VM_ALLOC_ZERO by zeroing the page, rather than by preserving PG_ZERO.

This simplifies callers and will permit simplification of the
vm_page_alloc_domain() definition.

Since the new allocator variant is similar to vm_page_alloc_freelist(),
implement both of them using a common backend allocator function.  No
functional change intended.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit b498f71bc5)
2021-11-03 13:35:25 -04:00
Ryan Stone
8deb5f2f64 Add a VM flag to prevent reclaim on a failed contig allocation
If a M_WAITOK contig alloc fails, the VM subsystem will try to
reclaim contiguous memory twice before actually failing the
request.  On a system with 64GB of RAM I've observed this take
400-500ms before it finally gives up, and I believe that this
will only be worse on systems with even more memory.

In certain contexts this delay is extremely harmful, so add a flag
that will skip reclaim for allocation requests to allow those
paths to opt-out of doing an expensive reclaim.

Sponsored by: Dell Inc
Differential Revision:	https://reviews.freebsd.org/D28422
Reviewed by: markj, kib

(cherry picked from commit 660344ca44)
2021-11-03 13:35:16 -04:00
Mark Johnston
bdfb568f8d redzone: Raise a compile error if KASAN is configured
redzone(9) does some munging of the allocation to insert redzones before
and after a valid memory buffer, but KASAN does not know about this and
will raise false positives if both are configured.  Until this is fixed,
do not allow both to be configured.  Note that KASAN provides similar
checking on its own but currently does not force the creation of
redzones for all UMA allocations; this should be addressed as well.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 4e8e26a004)
2021-11-01 10:07:31 -04:00
Mark Johnston
db33d492c8 uma: Fix a few problems with KASAN integration
- Ensure that all items returned by UMA are aligned to
  KASAN_SHADOW_SCALE (8).  This was true in practice since smaller
  alignments are not used by any consumers, but we should enforce it
  anyway.
- Use a non-zero code for marking redzones that appear naturally in
  items that are not a multiple of the scale factor in size.  Currently
  we do not modify keg layouts to force the creation of redzones.
- Use a non-zero code for marking freed per-CPU items, otherwise
  accesses of freed per-CPU items are not detected by the runtime.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit b0dfc48684)
2021-11-01 10:07:04 -04:00
Mark Johnston
28c338b342 realloc: Fix KASAN(9) shadow map updates
When copying from the old buffer to the new buffer, we don't know the
requested size of the old allocation, but only the size of the
allocation provided by UMA.  This value is "alloc".  Because the copy
may access bytes in the old allocation's red zone, we must mark the full
allocation valid in the shadow map.  Do so using the correct size.

Reported by:	kp
Tested by:	kp
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 9a7c2de364)
2021-11-01 10:05:22 -04:00
Mark Johnston
ed66f9c61b kmem: Add KASAN state transitions
Memory allocated with kmem_* is unmapped upon free, so KASAN doesn't
provide a lot of benefit, but since allocations are always a multiple of
the page size we can create a redzone when the allocation request size
is not a multiple of the page size.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 2b914b85dd)
2021-11-01 10:03:11 -04:00
Mark Johnston
9d95539ffe kstack: Add KASAN state transitions
We allocate kernel stacks using a UMA cache zone.  Cache zones have
KASAN disabled by default, but in this case it makes sense to enable it.

Reviewed by:	andrew

(cherry picked from commit 244f3ec642)
2021-11-01 10:03:02 -04:00
Mark Johnston
82f3e32c39 uma: Add KASAN state transitions
- Add a UMA_ZONE_NOKASAN flag to indicate that items from a particular
  zone should not be sanitized.  This is applied implicitly for NOFREE
  and cache zones.
- Add KASAN call backs which get invoked:
  1) when a slab is imported into a keg
  2) when an item is allocated from a zone
  3) when an item is freed to a zone
  4) when a slab is freed back to the VM

  In state transitions 1 and 3, memory is poisoned so that accesses will
  trigger a panic.  In state transitions 2 and 4, memory is marked
  valid.
- Disable trashing if KASAN is enabled.  It just adds extra CPU overhead
  to catch problems that are detected by KASAN.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 09c8cb717d)
2021-11-01 10:02:54 -04:00
Konstantin Belousov
9b392d0738 sysctl vm.objects: yield if hog
(cherry picked from commit 350fc36b4c)
2021-11-01 02:44:51 +02:00
Konstantin Belousov
5ac0e08ef6 vm.objects_swap: disable reporting some information
(cherry picked from commit 7738118e9a)
2021-11-01 02:44:51 +02:00
Konstantin Belousov
c54be5cfcf Add vm.swap_objects sysctl
(cherry picked from commit 42812ccc96)
2021-11-01 02:44:51 +02:00
Konstantin Belousov
7db438d470 vm_object_list: split sysctl handler in separate function
(cherry picked from commit 1b610624fd)
2021-11-01 02:44:51 +02:00
Mark Johnston
74efe421ea vm_page: Move vm_page_alloc_check() to after page allocator definitions
This way all of the vm_page_alloc_*() allocator functions are grouped
together.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit a23e6a1078)
2021-10-27 09:53:29 -04:00
Mitchell Horne
5794f8c75e minidump: De-duplicate is_dumpable()
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().

Reviewed by:	kib, markj, imp (earlier version)
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31884

(cherry picked from commit 31991a5a45)
2021-10-15 12:20:48 -03:00
Konstantin Belousov
0b29fd06da vm_fault: do not trigger OOM too early
(cherry picked from commit 174aad047e)
2021-10-10 12:22:58 +03:00
Mark Johnston
e68465ecbf uma: Show the count of free slabs in each per-domain keg's sysctl tree
This is useful for measuring the number of pages that could be freed
from a NOFREE zone under memory pressure.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit d6e77cda9b)
2021-09-24 09:01:22 -04:00
Mark Johnston
ca85fb7e0b swap_pager: Handle large swap_pager_reserve() requests
This interface is used solely by md(4) when the MD_RESERVE flag is
specified, as in `mdconfig -a -t swap -s 1G -o reserve`.  It
pre-allocates swap blocks for the entire object.

The number of blocks to be reserved is specified as a vm_size_t, but
swp_pager_getswapspace() can allocate at most INT_MAX blocks.  vm_size_t
also seems like the incorrect type to use here it refers only to the
size of the VM object, not the size of a mapping.  So:
- change the type of "size" in swap_pager_reserve() to vm_pindex_t, and
- clamp the requested number of blocks for a single
  swp_pager_getswapspace() call to INT_MAX.

Reported by:	syzkaller
Reviewed by:	dougm, alc, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 686aa9287c)
2021-09-21 09:38:03 -04:00
Konstantin Belousov
6a9e83fec1 vm_page_startup: correct calculation of the starting page
(cherry picked from commit bd3a668087)
2021-09-21 16:31:27 +03:00
Konstantin Belousov
62faf669ac vm_phys: do not ignore phys_avail[] segments that do not fit completely into vm_phys segments
(cherry picked from commit 181bfb42fd)
2021-09-21 16:31:27 +03:00
Konstantin Belousov
a686d177a7 Add pmap_vm_page_alloc_check()
(cherry picked from commit 041b7317f7)
2021-08-24 02:21:13 +03:00
Gordon Bergling
337c971838 Fix a few typos in source code comments
- s/becase/because/

(cherry picked from commit fa7a635f7e)
2021-08-19 09:29:50 +02:00
Konstantin Belousov
c7b5abde53 Add vn_lktype_write()
(cherry picked from commit 0ef5eee9d9)
2021-08-12 15:37:54 +03:00
Konstantin Belousov
57184d6a6c Un-staticise vm_page_init_page()
(cherry picked from commit 5b10e79edb)
2021-06-24 05:20:33 +03:00
Mateusz Guzik
90b82ea9a6 vm: add another pager private flag
Contrary to what was done in main, skip the following in order to not
disrupt KBI:
Move OBJ_SHADOWLIST around to let pager flags be next to each other.

(cherry picked from commit 128e25842e)
2021-05-22 18:28:29 +00:00
Konstantin Belousov
03aecce81c tmpfs: dynamically register tmpfs pager
(cherry picked from commit 28bc23ab92)
2021-05-22 12:38:30 +03:00
Konstantin Belousov
2f5321c170 vm: Add KPI to dynamically register pagers
(cherry picked from commit b730fd30b7)
2021-05-22 12:38:30 +03:00
Konstantin Belousov
bf9b8d2ae0 sys/vm: remove several other uses of OBJT_SWAP_TMPFS
(cherry picked from commit 7079449b0b)
2021-05-22 12:38:30 +03:00
Konstantin Belousov
324fbdb27a vm_object_set_memattr(): handle all object types without listing them explicitly
(cherry picked from commit 3e7a11ca21)
2021-05-22 12:38:30 +03:00
Konstantin Belousov
4c4bb6da85 vm_object_kvme_type(): reimplement by embedding kvme_type into pagerops
(cherry picked from commit 00a3fe968b)
2021-05-22 12:38:30 +03:00
Konstantin Belousov
55b68c9ac1 Constify vm_pager-related virtual tables.
(cherry picked from commit d474440ab3)
2021-05-22 12:38:29 +03:00
Konstantin Belousov
2daf5ac2e5 Add OBJT_SWAP_TMPFS pager
(cherry picked from commit 4b8365d752)
2021-05-22 12:38:29 +03:00
Konstantin Belousov
f871a71a82 pagertab: use designated initializers
(cherry picked from commit 0d2dfc6fed)
2021-05-22 12:38:29 +03:00
Konstantin Belousov
6ecea720f3 Style enum obj_type
(cherry picked from commit 838adc533f)
2021-05-22 12:38:29 +03:00
Konstantin Belousov
da0e85f9eb Implement vm_object_vnode() using vm_pager_getvp()
(cherry picked from commit a7c198a24b)
2021-05-22 12:38:29 +03:00