Commit graph

4971 commits

Author SHA1 Message Date
Doug Moore
6b33d9dc46 vm_page: expose page_alloc_after
vm_page_alloc() just calls vm_page_alloc_after(), after it has found
the predecessor of a page parameter. Many callers of vm_page_alloc()
already know that predecessor. Letting them pass that to
vm_page_alloc_after() directly could save a little redundant
calculation.

Reviewed by:	alc
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D49103
2025-02-27 10:55:33 -06:00
Doug Moore
bb1dc6cf9c vm_page: define partial page invalidate
Two different functions in different files do the same thing - fill a
partial page with zeroes. Add that functionality to vm_page.c and
remove it elsewhere to avoid code duplication.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D49096
2025-02-21 19:22:47 -06:00
Doug Moore
2eef41e553 Revert "vm_page: define partial page invalidate"
A negative review arrived as this was being committed, so undo and
reevaluate.

This reverts commit 5611a38d81.
2025-02-21 15:14:54 -06:00
Doug Moore
5611a38d81 vm_page: define partial page invalidate
Two different functions in different files do the same thing - fill a
partial page with zeroes. Add that functionality to vm_page.c and
remove it elsewhere to avoid code duplication.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D49096
2025-02-21 15:11:13 -06:00
Olivier Certner
16317a174a
vm_page_startup(): Clarify memory lowest, highest and size computation
Change the comment before this block of code, and separate the latter
from the preceding one by an empty line.

Move the loop on phys_avail[] to compute the minimum and maximum memory
physical addresses closer to the initialization of 'low_avail' and
'high_avail', so that it's immediately clear why the loop starts at
2 (and remove the related comment).

While here, fuse the additional loop in the VM_PHYSSEG_DENSE case that
is used to compute the exact physical memory size.

This change suppresses one occurence of detecting whether at least one
of VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE is defined at compile time, but
there is still another one in PHYS_TO_VM_PAGE().

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48632
2025-02-19 15:13:27 +01:00
Olivier Certner
32e77bcdec
vm_phys_early_startup(): Panic if phys_avail[] is empty
Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48631
2025-02-19 15:13:27 +01:00
Olivier Certner
e1499bfff8
vm_phys_avail_split(): Tolerate split requests at boundaries
Previously, such requests would lead to a panic.  The only caller so far
(vm_phys_early_startup()) actually faces the case where some address can
be one of the chunk's boundaries and has to test it by hand.  Moreover,
a later commit will introduce vm_phys_early_alloc_ex(), which will also
have to deal with such boundary cases.

Consequently, make this function handle boundaries by not splitting the
chunk and returning EJUSTRETURN instead of 0 to distinguish this case
from the "was split" result.

While here, expand the panic message when the address to split is not in
the passed chunk with available details.

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48630
2025-02-19 15:13:27 +01:00
Olivier Certner
291b7bf071
vm_phys_avail_count(): Fix out-of-bounds accesses
On improper termination of phys_avail[] (two consecutive 0 starting at
an even index), this function would (unnecessarily) continue searching
for the termination markers even if the index was out of bounds.

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48629
2025-02-19 15:13:27 +01:00
Olivier Certner
8a14ddcc1d
vm_phys: Check for overlap when adding a segment
Segments are passed by machine-dependent routines, so explicit checks
will make debugging much easier on very weird machines or when someone
is tweaking these machine-dependent routines.  Additionally, this
operation is not performance-sensitive.

For the same reasons, test that we don't reach the maximum number of
physical segments (the compile-time of the internal storage) in
production kernels (replaces the existing KASSERT()).

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48628
2025-02-19 15:13:26 +01:00
Olivier Certner
f30309abcc
vm_phys_add_seg(): Check for bad segments, allow empty ones
A bad specification is if 'start' is strictly greater than 'end', or
bounds are not page aligned.

The latter was already tested under INVARIANTS, but now will be also on
production kernels.  The reason is that vm_phys_early_startup() pours
early segments into the final phys_segs[] array via vm_phys_add_seg(),
but vm_phys_early_add_seg() did not check their validity.  Checking
segments once and for all in vm_phys_add_seg() avoids duplicating
validity tests and is possible since early segments are not used before
being poured into phys_segs[].  Finally, vm_phys_add_seg() is not
performance critical.

Allow empty segments and discard them (silently, unless 'bootverbose' is
true), as vm_page_startup() was testing for this case before calling
vm_phys_add_seg(), and we felt the same test in vm_phys_early_startup()
was due before calling vm_phys_add_seg().  As a consequence, remove the
empty segment test from vm_page_startup().

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48627
2025-02-19 15:13:26 +01:00
Olivier Certner
125ef4e041
vm_phys_avail_check(): Check index parity, fix panic messages
The passed index must be the start of a chunk in phys_avail[], so must
be even.  Test for that and print a separate panic message.

While here, fix panic messages: In one, the wrong chunk boundary was
printed, and in another, the desired but not the actual condition was
printed, possibly leading to confusion.

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48626
2025-02-19 15:13:21 +01:00
Doug Moore
fa462b8b8e vm_object: drop pointless assignment
An assignment in collapse_scan() has become useless because, on every
path, another assignment to that variable overrides it before that
variable is read.  Another assignment can be avoided sometimes, so
move it down in the loop to where it's really necessary.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D49017
2025-02-15 12:09:26 -06:00
Doug Moore
ee511f83b3 vm_reserv: use default pool for free page removal.
Differential Revision:	https://reviews.freebsd.org/D45409
2025-02-03 15:58:17 -06:00
Jessica Clarke
4015ff43cb vm: Fix overflow issues in vm_page_startup
Firstly, pagecount is a u_long so we should ensure j is the same for the
sake of 64-bit systems. Secondly, ptoa is just a macro, and does not
cast its argument, so in order to handle PAE systems correctly we need
to cast j to vm_paddr_t (the type of startp).

Fixes:	0078df5f02 ("vm_phys: reduce touching of page->pool fields")
2025-01-31 18:37:27 +00:00
Doug Moore
0078df5f02 vm_phys: reduce touching of page->pool fields
Change the usage of the pool field in vm_page structs.

Currently, every page belongs to a pool, and the pool field identifies
that pool, whether the page is allocated or free.

With this change, the pool field of the first page of a free block is
used by the buddy allocator to identify its pool, but the buddy
allocator makes no guarantees about the pool field value for allocated
pages. The buddy allocator requires that a pool parameter be passed as
part of freeing memory. A function that allocates memory may use the
pool field of a page to record what pool to pass as that parameter
when the memory is freed, but might not need to do so for every
allocated page.

Suggested by:	alc
Reviewed by:	markj (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D45409
2025-01-29 03:13:17 -06:00
Doug Moore
18c47eab72 Revert "vm_phys: reduce touching of page->pool fields". Pho reports, and I have
verified, that it sometimes crashes the kernel on the mmap41.sh stress test.

This reverts commit c669b08bd8.
2025-01-23 10:57:23 -06:00
Doug Moore
c669b08bd8 vm_phys: reduce touching of page->pool fields
Change the usage of the pool field in vm_page structs.

Currently, every page belongs to a pool, and the pool field identifies
that pool, whether the page is allocated or free.

With this change, the pool field of the first page of a free block is
used by the buddy allocator to identify its pool, but the buddy
allocator makes no guarantees about the pool field value for allocated
pages. The buddy allocator requires that a pool parameter be passed as
part of freeing memory. A function that allocates memory may use the
pool field of a page to record what pool to pass as that parameter
when the memory is freed, but might not need to do so for every
allocated page.

Suggested by:	alc
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D45409
2025-01-21 16:35:25 -06:00
Andrew Gallatin
cf90707467 Introduce the UMA_ZONE_NOTRIM uma zone type
The ktls buffer zone allocates 16k contiguous buffers, and often needs
to call vm_page_reclaim_contig_domain_ext() to free up contiguous
memory, which can be expensive.  Web servers which have a daily
pattern of peaks and troughs end up having UMA trim the
ktls_buffer_zone when they are in their trough, and end up re-building
it on the way to their peak.

Rather than calling vm_page_reclaim_contig_domain_ext() multiple times
on a daily basis, lets mark the ktls_buffer_zone with a new UMA flag,
UMA_ZONE_NOTRIM.  This disables UMA_RECLAIM_TRIM on the zone, but
allows UMA_RECLAIM_DRAIN* operations, so that if we become extremely
short of memory (vm_page_count_severe()), the uma reclaim worker can
still free up memory.

Note that UMA_ZONE_UNMANAGED already exists, but can never be drained
or trimmed, so it may hold on to memory during times of severe memory
pressure.  Using UMA_ZONE_NOTRIM rather than UMA_ZONE_UNMANAGED is an
attempt to keep this zone more reactive in the face of severe memory
pressure.

Sponsored by: Netflix
Reviewed by: jhb, kib, markj (via slack)
Differential Revision: https://reviews.freebsd.org/D48451
2025-01-15 12:23:00 -05:00
Mark Johnston
55b343f4f9 vm_pageout: Add a chicken switch for multithreaded PQ_INACTIVE scanning
Right now we have the vm.pageout_cpus_per_thread tunable which controls
the number of threads to start up per CPU per NUMA domain, but after
booting, it's not possible to disable multi-threaded scanning.

There is at least one workload where this mechanism doesn't work well;
let's make it possible to disable it without a reboot, to simplify
troubleshooting.

Reviewed by:	dougm, kib
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D48377
2025-01-09 14:54:10 +00:00
Mark Johnston
fe1165df4b vm_pageout: Make vmd_oom a bool
No functional change intended.

Reviewed by:	dougm, kib
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D48376
2025-01-09 14:53:37 +00:00
Konstantin Belousov
fae0cc5fd8 vm/vm_map.h: drop vm_flags_t
Suggested by:	alc
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D47934
2024-12-09 05:27:44 +02:00
Konstantin Belousov
d939fd2d45 vm_map: convert several bool members into flags
Extend flags to u_int.
Move system_map and needs_wakeup bools into flags.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D47934
2024-12-09 05:27:44 +02:00
Konstantin Belousov
b4431e9554 vm/vm_map.h: extend number of digits in vm_map flags definitions
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D47934
2024-12-09 05:27:44 +02:00
Konstantin Belousov
c5b19cef36 vm_map: wrap map->system_map checks into wrapper
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D47934
2024-12-09 05:27:44 +02:00
Konstantin Belousov
6ed68e6f5d vm_map: overlap system map mutex and user man sx
This saves 616-584 = 32 bytes per struct vmspace on amd64, which allows
to pack 7 vmspaces per page vs. 6 for non-overlapping layout.

I used anonymous union member feature to avoid too much churn.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D47934
2024-12-09 05:27:44 +02:00
Doug Moore
c1d12b925b vm_page: pass page to iter_remove
Pass the to-be-freed page to vm_page_iter_remove as a parameter,
rather than computing it from the iterator parameter, to improve
performance.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D47730
2024-12-08 14:30:22 -06:00
Konstantin Belousov
d302c05393 vm: rename MAP_STACK_GROWS_DOWN to MAP_STACK_AREA
Reviewed by:	alc, dougm, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D47892
2024-12-06 09:46:59 +02:00
Konstantin Belousov
0304675486 vm_map: remove _GN suffix from MAP_ENTRY_STACK_GAP and MAP_CREATE_STACK_GAP symbols`
Reviewed by:	alc, dougm, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D47892
2024-12-06 09:46:55 +02:00
Konstantin Belousov
17e624ca85 sys/vm: remove support for growing-up stacks
Reviewed by:	alc, dougm, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D47892
2024-12-06 09:46:49 +02:00
Alan Cox
c296ac7e0f vm: Optimize page rename
Rename vm_page_rename() to vm_page_iter_rename() to reflect its
reimplementation using iterators, and pass the page to this function
rather than spending clock cycles looking it up.  Change its return
value from 0/1 to a bool.

Reviewed by:	dougm, markj
Differential Revision:	https://reviews.freebsd.org/D47829
2024-11-30 02:59:15 -06:00
Doug Moore
40c1672e88 swap_pager: fix seek_data with invalid first page
Correct swap_pager_seek_data so that, when the first lookup finds no
valid pages, second and subsequent lookups are attempted anyway.

This was broken by db08b0b04d.

Reported by:	marklmi@yahoo.com
Reviewed by:	kib
Tested by:	marklmi@yahoo.com
Fixes:	db08b0b04d tmpfs_vnops: move swap work to swap_pager
Differential Revision:	https://reviews.freebsd.org/D47767
2024-11-26 12:12:08 -06:00
Doug Moore
ff4c19bb54 vm_page: pass page to iter_free
Pass the to-be-freed page to vm_page_iter_free as a parameter, rather
than computing it from the iterator parameter, to improve performance.

Sort declarations of page_iter functions in vm_page.h.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D47727
2024-11-25 02:03:34 -06:00
Konstantin Belousov
7fbc896e28 vm_page.c: remove transiently defined vm_page_free_toq_impl() prototype
Sponsored by:	The FreeBSD Foundation
2024-11-23 12:02:00 +02:00
Mark Johnston
4efe531c9d buf: Add a runningbufclaim() helper
No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D47696
2024-11-22 14:03:40 +00:00
Doug Moore
38e3125d6d device_pager: user iterators to free device pages
Change cdev_mgtdev_page_free_page to take an iterator, rather than an
object and page, so that removing the page from the object radix tree
can take advantage of locality with iterators. Define a
general-purpose function to free all pages, which can be used in
several places.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D47692
2024-11-21 15:49:30 -06:00
Doug Moore
18a8f4e586 vm_page: correct page iterator patch
The previous change committed a preliminary version of the change to
use iterators to free page sequences.  This updates to what was
intended to be the final version.

Reviewed by:	markj (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D46724
2024-11-20 12:00:57 -06:00
Doug Moore
5b78ff8307 vm_page: remove pages with iterators
Use pctrie iterators for removing some page sequences from radix
trees, to avoid repeated searches from the tree root.

Rename vm_page_object_remove to vm_page_remove_radixdone, and remove
from it the responsibility for removing a page from its radix tree,
and pass that responsibility on to its callers.

For one of those callers, vm_page_rename, pass a pages pctrie_iter,
rather than a page, and use the iterator to remove the page from its
radix tree.

Define functions vm_page_iter_remove() and vm_page_iter_free() that
are like vm_page_remove() and vm_page_free(), respectively, except
that they take an iterator as parameter rather than a page, and use
the iterator to remove the page from the radix tree instead of
searching the radix tree. Function vm_page_iter_free() assumes that
the page is associated with an object, and calls
vm_page_free_object_prep to do the part of vm_page_free_prep that is
object-related.

In functions vm_object_split and vm_object_collapse_scan, use a
pctrie_iter to walk over the pages of the object, and use
vm_page_rename and vm_radix_iter_remove modify the radix tree without
searching for pages.  In vm_object_page_remove and _kmem_unback, use a
pctrie_iter and vm_page_iter_free to remove the page from the radix
tree.

Reviewed by:	markj (prevoius version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D46724
2024-11-20 11:54:20 -06:00
Alan Cox
8c8d36b9d1 vm: static-ize vm_page_alloc_after()
This function is only intended for the internal use of the VM system.

Reviewed by:	dougm, kib, markj
Differential Revision:	https://reviews.freebsd.org/D47644
2024-11-17 12:19:00 -06:00
Doug Moore
f334c0b8b3 vm_page: use iterators in alloc_contig_domain
Restructure a bit of code to allow vm_page_alloc_contig_domain to use
pctrie iterators for lookup and insertion into the object radix tree,
to improve performance.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D47036
2024-11-16 13:15:05 -06:00
Mark Johnston
d11d407aee swap_pager: Ensure that swapoff puts swapped-in pages in page queues
Readahead/behind pages are handled by the swap pager, but the get_pages
caller is responsible for putting fetched pages into queues (or wiring
them beforehand).

Note that the VM object lock prevents the newly queued page from being
immediately reclaimed in the window before it is marked dirty by
swap_pager_swapoff_object().

Reported by:	pho
Tested by:	pho
Reviewed by:	dougm, alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D47526
2024-11-13 14:14:32 +00:00
Doug Moore
f3895e983c vm_radix: Add iter lookup_le interface
Add a function to the vm_radix interface to lookup the greatest page
less than or equal to some given page.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D47046
2024-11-09 13:32:58 -06:00
Konstantin Belousov
580340dbda vm_object: do not assume that un_pager.devp.dev is cdev
It is subtype-specific handle.  Mark OBJT_DEVICE that do fill cdev into
the handle, with a new object flag OBJ_CDEVH.

PR:	282533
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D47443
2024-11-06 02:11:00 +02:00
Konstantin Belousov
f0c07fe3d0 device_pager: rename the un_pager.devp.dev field to handle
because it is not neccessary struct cdev *.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D47443
2024-11-06 02:10:59 +02:00
Alan Cox
2001bef84b vm: Eliminate unnecessary lock asserts
There is no actual need for the VM object to be locked when initializing
a VM page iterator.

Reviewed by:	dougm
Differential Revision:	https://reviews.freebsd.org/D47298
2024-10-27 14:03:52 -05:00
Doug Moore
39f6d1e7f8 swap_pager: iter in haspage, lookup, getpages
Use pctrie iterators for swblk traversal in more swap_pager
functions: swap_pager_haspage, swp_pager_meta_lookup, and
swap_pager_getpages.

Reported by:	markj
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D47232
2024-10-26 13:03:40 -05:00
Doug Moore
faa9356f97 swap_pager: fix seek_hole assert
Moving code from tmpfs to swap_pager introduced another WLOCKED object
assert that should have been an RLOCKED object assert.  Fix it.
2024-10-24 18:08:32 -05:00
Doug Moore
02e85d1c8a swap_pager: fix assert in seek_data
An assertion that an object was write-locked should be instead an
assertion that the object is read locked.

Reported by:	Jenkins
Fixes:	 db08b0b04d tmpfs_vnops: move swap work to swap_pager
Differential Revision:	https://reviews.freebsd.org/D47278
2024-10-24 18:04:19 -05:00
Doug Moore
db08b0b04d tmpfs_vnops: move swap work to swap_pager
Two functions in tmpfs_vnops.c use an interface provided by
swap_pager.c. Move most of the implementation of those functions to
swap_pager.c so that they can be implemented more effectively, with
access to implementation details of the swap pager.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D47212
2024-10-24 14:24:49 -05:00
Doug Moore
34951b0b9e swap_pager: move scan_all_shadowed, use iterators
Move vm_object_scan_all_shadowed from vm_object.c to swap_pager.c, and
rename it. In the moved function, use vm_page and swblk iterators to
advance through the objects. Avoid checking a backing page for
busyness or validity more than once, or when it is beyond the upper
bound of the scan.

Reviewed by:	kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D47150
2024-10-23 21:30:45 -05:00
Mark Johnston
6a07e67fb7 vm_meter: Fix laundry accounting
Pages in PQ_UNSWAPPABLE should be considered part of the laundry.
Otherwise, on systems with no swap, the total amount of memory visible
to tools like top(1) decreases.

It doesn't seem very useful to have a dedicated counter for unswappable
pages, and updating applications accordingly would be painful, so just
lump them in with laundry for now.

PR:		280846
Reviewed by:	bnovkov, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47216
2024-10-22 12:48:43 +00:00