Use u_long for memory accesses instead of uint32_t. On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses. i386 still uses 32bit accesses.
MFC after: 1 month
(cherry picked from commit 7c566d6cfc7bfb913bad89d87386fa21dce8c2e6)
vm_phys_find_freelist_contig is called to search a list of max-sized
free page blocks and find one that, when joined with adjacent blocks
in memory, can satisfy a request for a memory allocation bigger than
any single max-sized free page block. In commit
fa8a6585c7, I defined this function in
order to offer two improvements: 1) reduce the worst-case search time,
and 2) allow solutions that include less-than max-sized free page
blocks at the front or back of the giant allocation. However, it turns
out that this change introduced an error, reported in In Bug
274592. That error concerns failing to check segment boundaries. This
change fixes an error in vm_phys_find_freelist_config that resolves
that bug. It also abandons improvement 2), because the value of that
improvement is small and because preserving it would require more
testing than I am able to do.
PR: 274592
Reported by: shafaisal.us@gmail.com
Reviewed by: alc, markj
Tested by: shafaisal.us@gmail.com
Fixes: fa8a6585c7 vm_phys: avoid waste in multipage allocation
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D42509
(cherry picked from commit 2a4897bd4e1bd8430d955abd3cf6675956bb9d61)
To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).
First candidate consumer that comes to mind is 'struct thread', see next
commit.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42265
(cherry picked from commit 733e0abd2897289e2acf70f7c72e31a5a560394a)
New function check_align_mask() asserts (under INVARIANTS) that the mask
fits in a (signed) integer (see the comment) and that the corresponding
alignment is a power of two.
Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate()
to replace the KASSERT() there (that was checking only for a power of
2).
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42263
(cherry picked from commit 87090f5e5a7b927a2ab30878435f6dcba0705a1d)
In uma_set_align_mask(), ensure that the passed value doesn't have its
highest bit set, which would lead to problems since keg/zone alignment
is internally stored as signed integers. Such big values do not make
sense anyway and indicate some programming error. A future commit will
introduce checks for this case and other ones.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42262
(cherry picked from commit 3d8f548b9e5772ff6890bdc01f7ba7b76203857d)
Having a special value of -1 that is resolved internally to
'uma_align_cache' provides no significant advantages and prevents
changing that variable to an unsigned type, which is natural for an
alignment mask. So suppress it and replace its use with a call to
uma_get_align_mask(). The small overhead of the added function call is
irrelevant since UMA_ALIGN_CACHE is only used when creating new zones,
which is not performance critical.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42259
(cherry picked from commit e557eafe7233f8231c1f5f5b098e4bab8e818645)
Create the uma_get_cache_align_mask() accessor and put it in a separate
private header so as to minimize namespace pollution in header/source
files that need only this function and not the whole 'uma.h' header.
Make sure the accessors have '_mask' as a suffix, so that callers are
aware that the real alignment is the power of two that is the mask plus
one. Rename the stem to something more explicit. Rename
uma_set_cache_align_mask()'s single parameter to 'mask'.
Hide 'uma_align_cache' to ensure that it cannot be set in any other way
then by a call to uma_set_cache_align_mask(), which will perform sanity
checks in a further commit. While here, rename it to
'uma_cache_align_mask'.
This is also in preparation for some further changes, such as improving
the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and
changing the type of the 'uma_cache_align_mask' variable.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42258
(cherry picked from commit dc8f7692fd1de628814f4eaf4a233dccf4c92199)
The loader tunable 'vm.numa.disabled' does not have corresponding sysctl
MIB entry. Add it so that it can be retrieved, and `sysctl -T` will also
report it correctly.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42138
(cherry picked from commit c415cfc8be1b732a80f1ada6d52091e08eeb9ab5)
The loader tunable 'vm.pgcache_zone_max_pcpu' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42138
(cherry picked from commit a55fbda874db31b804490567c69502c891b6ff61)
When we disable swapping to a device, we scan the full VM object list
looking for objects with swap trie nodes that reference the device in
question. The pages corresponding to those nodes are paged in.
While paging in, we drop the VM object lock. Moreover, we do not hold a
reference for the object; swap_pager_swapoff_object() merely bumps the
paging-in-progress counter. vm_object_terminate() waits for this
counter to drain before proceeding and freeing pages.
However, swap_pager_swapoff_object() decrements the counter before
re-acquiring the VM object lock, which means that vm_object_terminate()
can race to acquire the lock and free the pages. Then,
swap_pager_swapoff_object() ends up unbusying a freed page. Fix the
problem by acquiring the lock before waking up sleepers.
PR: 273610
Reported by: Graham Perrin <grahamperrin@gmail.com>
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42029
(cherry picked from commit e61568aeeec7667789e6c9d4837e074edecc990e)
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.
Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32360
Also, rename min_addr to default_addr, which better reflects what it
represents. The min_addr is not a minimum address in the same way that
max_addr is actually a maximum address that can be allocated. For
example, a non-zero hint can be less than min_addr and be allocated.
Reported by: dchagin
Reviewed by: dchagin, kib, markj
Fixes: d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision: https://reviews.freebsd.org/D41397
From the Linux man page for mprotect(2):
PROT_GROWSDOWN
Apply the protection mode down to the beginning of a mapping
that grows downward (which should be a stack segment or a
segment mapped with the MAP_GROWSDOWN flag set).
Reported by: dchagin
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
which requests to propagate lowest stack segment protection to the grow gap.
This seems to be required for Linux emulation.
Reported by: dchagin
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
mprotect(2) on the stack region needs to adjust guard stored protection,
so that e.g. enable executing on stack worked properly on stack growth.
Reported by: dchagin
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
Restructure the first phase slightly, to facilitate further changes.
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
Do not assume that protection is same as max_protection. Store both in
offset, packed in the same way as the prot syscall parameter.
Reviewed by: alc, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
The function returns the newly created entry.
Use vm_map_insert1() in stack grow code to avoid gap entry re-lookup.
The comment update for vm_map_try_merge_entries() was suggested by dougm.
Suggested by: alc
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
Only a part of the object may be mapped.
Noted by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
There is no list connecting all entries any more, and correspondingly no
order on the list entries.
Reviewed by: dougm
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D41405
Rewrite the final loop in vm_phys_enqueue_contig as a new function,
vm_phys_enq_beg, to reduce amd64 code size.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D41289
Do not assume that when vm_phys_enq_range is passed npages==0 that the
vm_page argument is valid in any way, much less that it has a
page-aligned address. Just don't look at it. Assert nothing about it.
Reported by: karels
Differential Revision: https://reviews.freebsd.org/D41317
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.
The amd64 object code shrinks by 128 bytes.
Reviewed by: kib (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D41154
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.
The amd64 object code shrinks by 80 bytes.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D41154
The resulting code is a bit more concise. No functional change
intended.
Reviewed by: alc, dougm, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41249
The computation of keybarr(), the function that determines when a
search has failed at a non-leaf node, can be done in a way that
computes the 'slot' value when keybarr() fails, which is exactly when
slot() would next be invoked. Computing things this way saves space in
search loops.
This reduces the amd64 coding of the search loop in vm_radix_lookup
from 40 bytes to 28 bytes.
Reviewed by: alc
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41235
The clev field in the node struct is almost always multiplied by
WIDTH; occasionally, it is incremented and then multiplied by
WIDTH. Instructions can be saved by storing it always multiplied by
WIDTH.
For the computation of slot(), this just eliminates a
multiplication. For trimkey(), where the caller always adds one to
clev before passing it as an argument, this change has the caller, not
the caller, do that. Trimkey() handles it not by adding WIDTH to the
input parameter, but by shifting COUNT, and not 1. That produces the
same result, and it relieves keybarr of the need to test to avoid
shifting by more than 63 bits, since level is always <= 63.
This takes 3 instrutions and 14 bytes out of the basic lookup loop on
amd64.
Reviewed by: kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41226
NULL (non-leaf) pointers with NULL leaves, there is a NULL test
removed from every iteration of an index-based search loop.
This speeds up radix trie searches by few percent. If there are any
radix tries that are not initialized with the init() function, but
instead depend on zeroing everything being proper initialization, this
will break those tries.
Reviewed by: alc, kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41171
Fix the handling of address hints that are less than min_addr by
vm_map_find_min().
Reported by: dchagin
Reviewed by: kib
Fixes: d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision: https://reviews.freebsd.org/D41159
If mprotect(2) changed protection in the bottom of the currently grown
stack region, currently the changed protection would be used for the
stack grow on next fault. This is arguably unexpected.
Store the original protection for the entry at mmap(2) time in the
offset member of the gap vm_map_entry, and use it for protection of the
grown stack region.
PR: 272585
Reported by: John F. Carr <jfc@mit.edu>
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41089
Replace the implementations of lookup_le and lookup_ge with ones
that do not use a stack or climb back up the tree, and instead
exploit the popmap field to quickly identify the place to resume
searching if the straightforward indexed search fails.
The code size of the original functions shrinks by a combined 160
bytes on amd64, and the cumulative cycle count per invocation of
the two functions together is reduced 20% in a buildworld test.
Reviewed by: alc, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40936
Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.
Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D40971
Two cases in the insert routine are written differently, when
they're really doing the same thing. Writing that case only once
saves 208 bytes in the compiled vm_radix_insert code and reduces
instructions executed by about 2%.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40807
Replace the 'count' field in a trie node with a bitmap that
identifies non-NULL children. Drop the 'last' field, and use the
last bit set in the bitmap instead. In lookup_le, lookup_ge,
remove, and reclaim_all, use the bitmap to find the
previous/next/only/every non-null child in constant time by
examining the bitmask instead of looping across array elements
and null-checking them one-by-one.
A buildworld test suggests that this reduces the cycle count on
those functions that eliminate some null-checks by 4.9%, 1.5%,
0.0% and 13.3%.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40775
This way a possible clash between FAULT_* and KERN_* numbering is
avoided, and panics checks for fault_status confusion become more
efficient.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40771
Let node_get calculate it's own owner value. Don't pass the count
parameter, since it's always 2. Save 16 bytes in insert(). Move,
without modifying, slot and trimkey to handle use-before-declaration
problem.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D40723