vm_page_alloc() just calls vm_page_alloc_after(), after it has found
the predecessor of a page parameter. Many callers of vm_page_alloc()
already know that predecessor. Letting them pass that to
vm_page_alloc_after() directly could save a little redundant
calculation.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D49103
Two different functions in different files do the same thing - fill a
partial page with zeroes. Add that functionality to vm_page.c and
remove it elsewhere to avoid code duplication.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D49096
Two different functions in different files do the same thing - fill a
partial page with zeroes. Add that functionality to vm_page.c and
remove it elsewhere to avoid code duplication.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D49096
Change the comment before this block of code, and separate the latter
from the preceding one by an empty line.
Move the loop on phys_avail[] to compute the minimum and maximum memory
physical addresses closer to the initialization of 'low_avail' and
'high_avail', so that it's immediately clear why the loop starts at
2 (and remove the related comment).
While here, fuse the additional loop in the VM_PHYSSEG_DENSE case that
is used to compute the exact physical memory size.
This change suppresses one occurence of detecting whether at least one
of VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE is defined at compile time, but
there is still another one in PHYS_TO_VM_PAGE().
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48632
Previously, such requests would lead to a panic. The only caller so far
(vm_phys_early_startup()) actually faces the case where some address can
be one of the chunk's boundaries and has to test it by hand. Moreover,
a later commit will introduce vm_phys_early_alloc_ex(), which will also
have to deal with such boundary cases.
Consequently, make this function handle boundaries by not splitting the
chunk and returning EJUSTRETURN instead of 0 to distinguish this case
from the "was split" result.
While here, expand the panic message when the address to split is not in
the passed chunk with available details.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48630
On improper termination of phys_avail[] (two consecutive 0 starting at
an even index), this function would (unnecessarily) continue searching
for the termination markers even if the index was out of bounds.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48629
Segments are passed by machine-dependent routines, so explicit checks
will make debugging much easier on very weird machines or when someone
is tweaking these machine-dependent routines. Additionally, this
operation is not performance-sensitive.
For the same reasons, test that we don't reach the maximum number of
physical segments (the compile-time of the internal storage) in
production kernels (replaces the existing KASSERT()).
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48628
A bad specification is if 'start' is strictly greater than 'end', or
bounds are not page aligned.
The latter was already tested under INVARIANTS, but now will be also on
production kernels. The reason is that vm_phys_early_startup() pours
early segments into the final phys_segs[] array via vm_phys_add_seg(),
but vm_phys_early_add_seg() did not check their validity. Checking
segments once and for all in vm_phys_add_seg() avoids duplicating
validity tests and is possible since early segments are not used before
being poured into phys_segs[]. Finally, vm_phys_add_seg() is not
performance critical.
Allow empty segments and discard them (silently, unless 'bootverbose' is
true), as vm_page_startup() was testing for this case before calling
vm_phys_add_seg(), and we felt the same test in vm_phys_early_startup()
was due before calling vm_phys_add_seg(). As a consequence, remove the
empty segment test from vm_page_startup().
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48627
The passed index must be the start of a chunk in phys_avail[], so must
be even. Test for that and print a separate panic message.
While here, fix panic messages: In one, the wrong chunk boundary was
printed, and in another, the desired but not the actual condition was
printed, possibly leading to confusion.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48626
An assignment in collapse_scan() has become useless because, on every
path, another assignment to that variable overrides it before that
variable is read. Another assignment can be avoided sometimes, so
move it down in the loop to where it's really necessary.
Reviewed by: alc, markj
Differential Revision: https://reviews.freebsd.org/D49017
Firstly, pagecount is a u_long so we should ensure j is the same for the
sake of 64-bit systems. Secondly, ptoa is just a macro, and does not
cast its argument, so in order to handle PAE systems correctly we need
to cast j to vm_paddr_t (the type of startp).
Fixes: 0078df5f02 ("vm_phys: reduce touching of page->pool fields")
Change the usage of the pool field in vm_page structs.
Currently, every page belongs to a pool, and the pool field identifies
that pool, whether the page is allocated or free.
With this change, the pool field of the first page of a free block is
used by the buddy allocator to identify its pool, but the buddy
allocator makes no guarantees about the pool field value for allocated
pages. The buddy allocator requires that a pool parameter be passed as
part of freeing memory. A function that allocates memory may use the
pool field of a page to record what pool to pass as that parameter
when the memory is freed, but might not need to do so for every
allocated page.
Suggested by: alc
Reviewed by: markj (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D45409
Change the usage of the pool field in vm_page structs.
Currently, every page belongs to a pool, and the pool field identifies
that pool, whether the page is allocated or free.
With this change, the pool field of the first page of a free block is
used by the buddy allocator to identify its pool, but the buddy
allocator makes no guarantees about the pool field value for allocated
pages. The buddy allocator requires that a pool parameter be passed as
part of freeing memory. A function that allocates memory may use the
pool field of a page to record what pool to pass as that parameter
when the memory is freed, but might not need to do so for every
allocated page.
Suggested by: alc
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D45409
The ktls buffer zone allocates 16k contiguous buffers, and often needs
to call vm_page_reclaim_contig_domain_ext() to free up contiguous
memory, which can be expensive. Web servers which have a daily
pattern of peaks and troughs end up having UMA trim the
ktls_buffer_zone when they are in their trough, and end up re-building
it on the way to their peak.
Rather than calling vm_page_reclaim_contig_domain_ext() multiple times
on a daily basis, lets mark the ktls_buffer_zone with a new UMA flag,
UMA_ZONE_NOTRIM. This disables UMA_RECLAIM_TRIM on the zone, but
allows UMA_RECLAIM_DRAIN* operations, so that if we become extremely
short of memory (vm_page_count_severe()), the uma reclaim worker can
still free up memory.
Note that UMA_ZONE_UNMANAGED already exists, but can never be drained
or trimmed, so it may hold on to memory during times of severe memory
pressure. Using UMA_ZONE_NOTRIM rather than UMA_ZONE_UNMANAGED is an
attempt to keep this zone more reactive in the face of severe memory
pressure.
Sponsored by: Netflix
Reviewed by: jhb, kib, markj (via slack)
Differential Revision: https://reviews.freebsd.org/D48451
Right now we have the vm.pageout_cpus_per_thread tunable which controls
the number of threads to start up per CPU per NUMA domain, but after
booting, it's not possible to disable multi-threaded scanning.
There is at least one workload where this mechanism doesn't work well;
let's make it possible to disable it without a reboot, to simplify
troubleshooting.
Reviewed by: dougm, kib
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D48377
Extend flags to u_int.
Move system_map and needs_wakeup bools into flags.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D47934
This saves 616-584 = 32 bytes per struct vmspace on amd64, which allows
to pack 7 vmspaces per page vs. 6 for non-overlapping layout.
I used anonymous union member feature to avoid too much churn.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D47934
Pass the to-be-freed page to vm_page_iter_remove as a parameter,
rather than computing it from the iterator parameter, to improve
performance.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D47730
Rename vm_page_rename() to vm_page_iter_rename() to reflect its
reimplementation using iterators, and pass the page to this function
rather than spending clock cycles looking it up. Change its return
value from 0/1 to a bool.
Reviewed by: dougm, markj
Differential Revision: https://reviews.freebsd.org/D47829
Correct swap_pager_seek_data so that, when the first lookup finds no
valid pages, second and subsequent lookups are attempted anyway.
This was broken by db08b0b04d.
Reported by: marklmi@yahoo.com
Reviewed by: kib
Tested by: marklmi@yahoo.com
Fixes: db08b0b04d tmpfs_vnops: move swap work to swap_pager
Differential Revision: https://reviews.freebsd.org/D47767
Pass the to-be-freed page to vm_page_iter_free as a parameter, rather
than computing it from the iterator parameter, to improve performance.
Sort declarations of page_iter functions in vm_page.h.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D47727
Change cdev_mgtdev_page_free_page to take an iterator, rather than an
object and page, so that removing the page from the object radix tree
can take advantage of locality with iterators. Define a
general-purpose function to free all pages, which can be used in
several places.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D47692
The previous change committed a preliminary version of the change to
use iterators to free page sequences. This updates to what was
intended to be the final version.
Reviewed by: markj (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D46724
Use pctrie iterators for removing some page sequences from radix
trees, to avoid repeated searches from the tree root.
Rename vm_page_object_remove to vm_page_remove_radixdone, and remove
from it the responsibility for removing a page from its radix tree,
and pass that responsibility on to its callers.
For one of those callers, vm_page_rename, pass a pages pctrie_iter,
rather than a page, and use the iterator to remove the page from its
radix tree.
Define functions vm_page_iter_remove() and vm_page_iter_free() that
are like vm_page_remove() and vm_page_free(), respectively, except
that they take an iterator as parameter rather than a page, and use
the iterator to remove the page from the radix tree instead of
searching the radix tree. Function vm_page_iter_free() assumes that
the page is associated with an object, and calls
vm_page_free_object_prep to do the part of vm_page_free_prep that is
object-related.
In functions vm_object_split and vm_object_collapse_scan, use a
pctrie_iter to walk over the pages of the object, and use
vm_page_rename and vm_radix_iter_remove modify the radix tree without
searching for pages. In vm_object_page_remove and _kmem_unback, use a
pctrie_iter and vm_page_iter_free to remove the page from the radix
tree.
Reviewed by: markj (prevoius version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D46724
This function is only intended for the internal use of the VM system.
Reviewed by: dougm, kib, markj
Differential Revision: https://reviews.freebsd.org/D47644
Restructure a bit of code to allow vm_page_alloc_contig_domain to use
pctrie iterators for lookup and insertion into the object radix tree,
to improve performance.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D47036
Readahead/behind pages are handled by the swap pager, but the get_pages
caller is responsible for putting fetched pages into queues (or wiring
them beforehand).
Note that the VM object lock prevents the newly queued page from being
immediately reclaimed in the window before it is marked dirty by
swap_pager_swapoff_object().
Reported by: pho
Tested by: pho
Reviewed by: dougm, alc, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47526
Add a function to the vm_radix interface to lookup the greatest page
less than or equal to some given page.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D47046
It is subtype-specific handle. Mark OBJT_DEVICE that do fill cdev into
the handle, with a new object flag OBJ_CDEVH.
PR: 282533
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D47443
because it is not neccessary struct cdev *.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D47443
There is no actual need for the VM object to be locked when initializing
a VM page iterator.
Reviewed by: dougm
Differential Revision: https://reviews.freebsd.org/D47298
Use pctrie iterators for swblk traversal in more swap_pager
functions: swap_pager_haspage, swp_pager_meta_lookup, and
swap_pager_getpages.
Reported by: markj
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D47232
An assertion that an object was write-locked should be instead an
assertion that the object is read locked.
Reported by: Jenkins
Fixes: db08b0b04d tmpfs_vnops: move swap work to swap_pager
Differential Revision: https://reviews.freebsd.org/D47278
Two functions in tmpfs_vnops.c use an interface provided by
swap_pager.c. Move most of the implementation of those functions to
swap_pager.c so that they can be implemented more effectively, with
access to implementation details of the swap pager.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D47212
Move vm_object_scan_all_shadowed from vm_object.c to swap_pager.c, and
rename it. In the moved function, use vm_page and swblk iterators to
advance through the objects. Avoid checking a backing page for
busyness or validity more than once, or when it is beyond the upper
bound of the scan.
Reviewed by: kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D47150
Pages in PQ_UNSWAPPABLE should be considered part of the laundry.
Otherwise, on systems with no swap, the total amount of memory visible
to tools like top(1) decreases.
It doesn't seem very useful to have a dedicated counter for unswappable
pages, and updating applications accordingly would be painful, so just
lump them in with laundry for now.
PR: 280846
Reviewed by: bnovkov, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47216