Remove always-false checks for UMA zone creation failure. No functional
change intended.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 43b3b8e52d)
Fix some style bugs while here. No functional change intended.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit c4a25e0713)
Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.
Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.
PR: 253208
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31516
(cherry picked from commit 889b56c8cd)
Summary:
One was required to press a key to continue after every 18 lines of
output. This requirement had been in the "show vmopag" command since it
was introduced, which was many years before paging was added to DDB.
With paging, this explict key check is no longer necessary.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
Test Plan:
Run "show vmopag" from db> prompt and see that it does not need additional
keypresses other than the ones needed for the pager.
Subscribers: imp, #contributor_reviews_base
Differential Revision: https://reviews.freebsd.org/D33550
(cherry picked from commit 18048b6e3c)
Handle specially the boundary==0 case of vm_reserv_reclaim_config,
by turning off boundary adjustment in that case.
Reviewed by: alc
Tested by: pho, madpilot
(cherry picked from commit 49fd2d51f0)
vm_map_wire() works by calling vm_fault(VM_FAULT_WIRE) on each page in
the rage. (For largepage mappings, it calls vm_fault() once per large
page.)
A pager's populate method may return more than one page to be mapped.
If VM_FAULT_WIRE is also specified, we'd wire each page in the run, not
just the fault page. Consider an object with two pages mapped in a
vm_map_entry, and suppose vm_map_wire() is called on the entry. Then,
the first vm_fault() would allocate and wire both pages, and the second
would encounter a valid page upon lookup and wire it again in the
regular fault handler. So the second page is wired twice and will be
leaked when the object is destroyed.
Fix the problem by modify vm_fault_populate() to wire only the fault
page. Also modify the error handler for pmap_enter(psind=1) to not test
fs->wired, since it must be false.
PR: 260347
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 88642d978a)
--Eliminate a big ifdef that encompassed all currently-supported
architectures except mips and powerpc32. This applied to the case
in which we've allocated a superpage but the pager-populated range
is insufficient for a superpage mapping. For platforms that don't
support superpages the check should be inexpensive as we shouldn't
get a superpage in the first place. Make the normal-page fallback
logic identical for all platforms and provide a simple implementation
of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc.
--Apply the logic for handling pmap_enter() failure if a superpage
mapping can't be supported due to additional protection policy.
Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case,
and note Intel PKU on amd64 as the first example of such protection
policy.
Reviewed by: kib, markj, bdragon
(cherry picked from commit 8dc8feb53d)
Function vm_reserv_test_contig has incorrectly used its alignment
and boundary parameters to find a well-positioned range of empty pages
in a reservation. Consequently, a reservation could be broken
mistakenly when it was unable to provide a satisfactory set of pages.
Rename the function, correct the errors, and add assertions to detect
the error in case it appears again.
Reviewed by: alc, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33344
(cherry picked from commit 6f1c890827)
In vm_reserv_init, set all the marker popmap bits in vm_reserv_init,
and not just the bits of the first popmap entry.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33258
(cherry picked from commit 9f32cb5b1c)
A page must not become invalid while vm_fault_soft_fast() is attempting
to map unbusied pages for reading.
Note that all callers hold the object write lock already, and
vm_page_set_invalid() asserts the object write lock.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 39a7396f5d)
Rather than overloading the meanings of the Mach statuses, introduce a
new set for use internally in the fault code. This makes the control
flow easier to follow and provides some extra error checking when a
fault status variable is used in a switch statement.
vm_fault_lookup() and vm_fault_relookup() continue to use Mach statuses
for now, as there isn't much benefit to converting them and they
effectively pass through a status from vm_map_lookup().
Obtained from: jeff (object_concurrency patches)
Reviewed by: kib
(cherry picked from commit f1b642c255)
This makes it easier to factor out pieces of vm_fault(). No functional
change intended.
Obtained from: jeff (object_concurrency patches)
Reviewed by: kib
(cherry picked from commit 45c09a74d6)
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Reviewed by: kib, markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31992
(cherry picked from commit 10fe6f80a6)
They are unused today and cannot be safely used in the face of unlocked
lookup, in which pages may be busied without the object lock held.
Obtained from: jeff (object_concurrency patches)
Reviewed by: kib
(cherry picked from commit a2665158d0)
- Modify vm_page_busy_sleep() and vm_page_busy_sleep_unlocked() to take
a VM_ALLOC_* flag indicating whether to sleep on shared-busy, and fix
up callers.
- Modify vm_page_busy_sleep() to return a status indicating whether the
object lock was dropped, and fix up callers.
- Convert callers of vm_page_sleep_if_busy() to use vm_page_busy_sleep()
instead.
- Remove vm_page_sleep_if_(x)busy().
No functional change intended.
Obtained from: jeff (object_concurrency patches)
Reviewed by: kib
(cherry picked from commit 87b646630c)
This will be used to break a deadlock in ZFS between the per-mountpoint
teardown lock and page busy locks. In particular, when purging data
from the page cache during dataset rollback, we want to avoid blocking
on the busy state of invalid pages since the busying thread may be
blocked on the teardown lock in zfs_getpages().
Add a helper, vn_pages_remove_valid(), for use by filesystems. Bump
__FreeBSD_version so that the OpenZFS port can make use of the new
helper.
PR: 258208
Reviewed by: avg, kib, sef
Tested by: pho (part of a larger patch)
Sponsored by: The FreeBSD Foundation
(cherry picked from commit d28af1abf0)
Kegs with no items reserved have uk_reserve = 0. So the check
keg->uk_reserve >= dom->ud_free_items will be true once all slabs are
depleted. Then, rather than go and allocate a fresh slab, we return to
the cache layer.
The intent was to do this only when the keg actually has a reserve, so
modify the check to verify this first. Another approach would be to
make uk_reserve signed and set it to -1 until uma_zone_reserve() is
called, but this requires a few casts elsewhere.
Fixes: 1b2dcc8c54 ("uma: Avoid depleting keg reserves when filling a bucket")
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 7585c5db25)
M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded
recursion when the direct map is not available, as is the case on 32-bit
platforms or when certain kernel sanitizers (KASAN and KMSAN) are
enabled. For example, to allocate KVA, the kernel might allocate a
kernel map entry, which might require a new slab, which requires KVA.
For these zones, we use uma_prealloc() to populate a reserve of items,
and then in certain serialized contexts M_USE_RESERVE can be used to
guarantee a successful allocation. uma_prealloc() allocates the
requested number of items, distributing them evenly among NUMA domains.
Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we
might have to check the slab lists of other domains than the current one
to provide the semantics expected by consumers.
So, try harder to find an item if M_USE_RESERVE is specified and the keg
doesn't have anything for current (first-touch) domain. Specifically,
fall back to a round-robin slab allocation. This change fixes boot-time
panics on NUMA systems with KASAN or KMSAN enabled.[1]
Alternately we could have uma_prealloc() allocate the requested number
of items for each domain, but for some existing consumers this would be
quite wasteful. In general I think keg_fetch_slab() should try harder
to find free slabs in other domains before trying to allocate fresh
ones, but let's limit this to M_USE_RESERVE for now.
Also fix a separate problem that I noticed: in a non-round-robin slab
allocation with M_WAITOK, rather than sleeping after a failed slab
allocation we simply try again. Call vm_wait_domain() before retrying.
Reported by: mjg, tuexen [1]
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
(cherry picked from commit fab343a716)
vm_reserv_reclaim_*() will release pages to the default freepool, not
the direct freepool from which noobj allocations are drawn. But if both
pools are empty, the noobj allocator variants must break reservations to
make progress.
Reported by: cy
Reviewed by: kib (previous version)
Fixes: b498f71bc5 ("vm_page: Add a new page allocator interface for unnamed pages")
Sponsored by: The FreeBSD Foundation
(cherry picked from commit d7acbe481d)
This is the same as vm_page_alloc_noobj(), but allocates physically
contiguous runs of memory. For now it is implemented in terms of
vm_page_alloc_contig(), with the difference that
vm_page_alloc_noobj_contig() implements VM_ALLOC_ZERO by zeroing the
page.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 92db9f3bb7)
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit a4667e09e6)
The diff adds vm_page_alloc_noobj() and vm_page_alloc_noobj_domain().
These mostly correspond to vm_page_alloc() and vm_page_alloc_domain()
when no VM object is specified, with the exception that they handle
VM_ALLOC_ZERO by zeroing the page, rather than by preserving PG_ZERO.
This simplifies callers and will permit simplification of the
vm_page_alloc_domain() definition.
Since the new allocator variant is similar to vm_page_alloc_freelist(),
implement both of them using a common backend allocator function. No
functional change intended.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit b498f71bc5)
If a M_WAITOK contig alloc fails, the VM subsystem will try to
reclaim contiguous memory twice before actually failing the
request. On a system with 64GB of RAM I've observed this take
400-500ms before it finally gives up, and I believe that this
will only be worse on systems with even more memory.
In certain contexts this delay is extremely harmful, so add a flag
that will skip reclaim for allocation requests to allow those
paths to opt-out of doing an expensive reclaim.
Sponsored by: Dell Inc
Differential Revision: https://reviews.freebsd.org/D28422
Reviewed by: markj, kib
(cherry picked from commit 660344ca44)
redzone(9) does some munging of the allocation to insert redzones before
and after a valid memory buffer, but KASAN does not know about this and
will raise false positives if both are configured. Until this is fixed,
do not allow both to be configured. Note that KASAN provides similar
checking on its own but currently does not force the creation of
redzones for all UMA allocations; this should be addressed as well.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 4e8e26a004)
- Ensure that all items returned by UMA are aligned to
KASAN_SHADOW_SCALE (8). This was true in practice since smaller
alignments are not used by any consumers, but we should enforce it
anyway.
- Use a non-zero code for marking redzones that appear naturally in
items that are not a multiple of the scale factor in size. Currently
we do not modify keg layouts to force the creation of redzones.
- Use a non-zero code for marking freed per-CPU items, otherwise
accesses of freed per-CPU items are not detected by the runtime.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit b0dfc48684)
When copying from the old buffer to the new buffer, we don't know the
requested size of the old allocation, but only the size of the
allocation provided by UMA. This value is "alloc". Because the copy
may access bytes in the old allocation's red zone, we must mark the full
allocation valid in the shadow map. Do so using the correct size.
Reported by: kp
Tested by: kp
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 9a7c2de364)
Memory allocated with kmem_* is unmapped upon free, so KASAN doesn't
provide a lot of benefit, but since allocations are always a multiple of
the page size we can create a redzone when the allocation request size
is not a multiple of the page size.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 2b914b85dd)
We allocate kernel stacks using a UMA cache zone. Cache zones have
KASAN disabled by default, but in this case it makes sense to enable it.
Reviewed by: andrew
(cherry picked from commit 244f3ec642)
- Add a UMA_ZONE_NOKASAN flag to indicate that items from a particular
zone should not be sanitized. This is applied implicitly for NOFREE
and cache zones.
- Add KASAN call backs which get invoked:
1) when a slab is imported into a keg
2) when an item is allocated from a zone
3) when an item is freed to a zone
4) when a slab is freed back to the VM
In state transitions 1 and 3, memory is poisoned so that accesses will
trigger a panic. In state transitions 2 and 4, memory is marked
valid.
- Disable trashing if KASAN is enabled. It just adds extra CPU overhead
to catch problems that are detected by KASAN.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 09c8cb717d)
This way all of the vm_page_alloc_*() allocator functions are grouped
together.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit a23e6a1078)
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().
Reviewed by: kib, markj, imp (earlier version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31884
(cherry picked from commit 31991a5a45)
This is useful for measuring the number of pages that could be freed
from a NOFREE zone under memory pressure.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit d6e77cda9b)