Commit graph

4794 commits

Author SHA1 Message Date
Alexander Motin
58f5c260a2 uma: Micro-optimize memory trashing
Use u_long for memory accesses instead of uint32_t.  On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses.  i386 still uses 32bit accesses.

MFC after:	1 month

(cherry picked from commit 7c566d6cfc7bfb913bad89d87386fa21dce8c2e6)
2023-12-08 21:32:43 -05:00
Doug Moore
210fce73ae vm_phys: fix freelist_contig
vm_phys_find_freelist_contig is called to search a list of max-sized
free page blocks and find one that, when joined with adjacent blocks
in memory, can satisfy a request for a memory allocation bigger than
any single max-sized free page block. In commit
fa8a6585c7, I defined this function in
order to offer two improvements: 1) reduce the worst-case search time,
and 2) allow solutions that include less-than max-sized free page
blocks at the front or back of the giant allocation. However, it turns
out that this change introduced an error, reported in In Bug
274592. That error concerns failing to check segment boundaries. This
change fixes an error in vm_phys_find_freelist_config that resolves
that bug. It also abandons improvement 2), because the value of that
improvement is small and because preserving it would require more
testing than I am able to do.

PR:		274592
Reported by:	shafaisal.us@gmail.com
Reviewed by:	alc, markj
Tested by:	shafaisal.us@gmail.com
Fixes:	fa8a6585c7 vm_phys: avoid waste in multipage allocation
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D42509

(cherry picked from commit 2a4897bd4e1bd8430d955abd3cf6675956bb9d61)
2023-11-24 19:19:05 -06:00
Olivier Certner
25e0e25afd uma: Permit specifying max of cache line and some custom alignment
To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).

First candidate consumer that comes to mind is 'struct thread', see next
commit.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42265

(cherry picked from commit 733e0abd2897289e2acf70f7c72e31a5a560394a)
2023-11-16 10:07:18 -05:00
Olivier Certner
7deedba4e5 uma: New check_align_mask(): Validate alignments (INVARIANTS)
New function check_align_mask() asserts (under INVARIANTS) that the mask
fits in a (signed) integer (see the comment) and that the corresponding
alignment is a power of two.

Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate()
to replace the KASSERT() there (that was checking only for a power of
2).

Reviewed by:            kib, markj
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42263

(cherry picked from commit 87090f5e5a7b927a2ab30878435f6dcba0705a1d)
2023-11-16 10:07:17 -05:00
Olivier Certner
690ca45aeb uma: Make the cache alignment mask unsigned
In uma_set_align_mask(), ensure that the passed value doesn't have its
highest bit set, which would lead to problems since keg/zone alignment
is internally stored as signed integers.  Such big values do not make
sense anyway and indicate some programming error.  A future commit will
introduce checks for this case and other ones.

Reviewed by:            kib, markj
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42262

(cherry picked from commit 3d8f548b9e5772ff6890bdc01f7ba7b76203857d)
2023-11-16 10:07:16 -05:00
Olivier Certner
c98dded0c7 uma: UMA_ALIGN_CACHE: Resolve the proper value at use point
Having a special value of -1 that is resolved internally to
'uma_align_cache' provides no significant advantages and prevents
changing that variable to an unsigned type, which is natural for an
alignment mask.  So suppress it and replace its use with a call to
uma_get_align_mask().  The small overhead of the added function call is
irrelevant since UMA_ALIGN_CACHE is only used when creating new zones,
which is not performance critical.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42259

(cherry picked from commit e557eafe7233f8231c1f5f5b098e4bab8e818645)
2023-11-16 10:07:11 -05:00
Olivier Certner
4587326893 uma: Hide 'uma_align_cache'; Create/rename accessors
Create the uma_get_cache_align_mask() accessor and put it in a separate
private header so as to minimize namespace pollution in header/source
files that need only this function and not the whole 'uma.h' header.

Make sure the accessors have '_mask' as a suffix, so that callers are
aware that the real alignment is the power of two that is the mask plus
one.  Rename the stem to something more explicit.  Rename
uma_set_cache_align_mask()'s single parameter to 'mask'.

Hide 'uma_align_cache' to ensure that it cannot be set in any other way
then by a call to uma_set_cache_align_mask(), which will perform sanity
checks in a further commit.  While here, rename it to
'uma_cache_align_mask'.

This is also in preparation for some further changes, such as improving
the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and
changing the type of the 'uma_cache_align_mask' variable.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42258

(cherry picked from commit dc8f7692fd1de628814f4eaf4a233dccf4c92199)
2023-11-16 10:07:07 -05:00
Zhenlei Huang
e26b7e8d02 vm_phys: Add corresponding sysctl knob for loader tunable
The loader tunable 'vm.numa.disabled' does not have corresponding sysctl
MIB entry. Add it so that it can be retrieved, and `sysctl -T` will also
report it correctly.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42138

(cherry picked from commit c415cfc8be1b732a80f1ada6d52091e08eeb9ab5)
2023-10-19 22:00:57 +08:00
Zhenlei Huang
cb5bc8a748 vm_page: Add corresponding sysctl knob for loader tunable
The loader tunable 'vm.pgcache_zone_max_pcpu' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42138

(cherry picked from commit a55fbda874db31b804490567c69502c891b6ff61)
2023-10-19 22:00:57 +08:00
Gordon Bergling
44e3ce37f2 uma.h: Fix a typo in a source code comment
- s/setable/settable/

(cherry picked from commit fc9f1d2c6391b1a4b133aab56ace625b72c9ea85)
2023-10-18 07:57:16 +02:00
Mark Johnston
aa229a59ad swap_pager: Fix a race in swap_pager_swapoff_object()
When we disable swapping to a device, we scan the full VM object list
looking for objects with swap trie nodes that reference the device in
question.  The pages corresponding to those nodes are paged in.

While paging in, we drop the VM object lock.  Moreover, we do not hold a
reference for the object; swap_pager_swapoff_object() merely bumps the
paging-in-progress counter.  vm_object_terminate() waits for this
counter to drain before proceeding and freeing pages.

However, swap_pager_swapoff_object() decrements the counter before
re-acquiring the VM object lock, which means that vm_object_terminate()
can race to acquire the lock and free the pages.  Then,
swap_pager_swapoff_object() ends up unbusying a freed page.  Fix the
problem by acquiring the lock before waking up sleepers.

PR:		273610
Reported by:	Graham Perrin <grahamperrin@gmail.com>
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42029

(cherry picked from commit e61568aeeec7667789e6c9d4837e074edecc990e)
2023-10-08 20:41:35 -04:00
Konstantin Belousov
8882b7852a add pmap_active_cpus()
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.

Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32360
2023-08-23 03:02:21 +03:00
Konstantin Belousov
5f452214f2 vm_map.c: fix syntax
Fixes:	c718009884
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-08-18 16:37:16 +03:00
Konstantin Belousov
c718009884 vm_map.c: plug several more places which might modify entry->offset
for the GUARD entries protecting stacks gaps.

syzkaller: https://syzkaller.appspot.com/bug?extid=c325d6a75e4fd0a68714
Reviewed by:	dougm, markj (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41475
2023-08-18 15:43:35 +03:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
2ff63af9b8 sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:18 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Dmitry Chagin
f3e11927dc vm: Allow MAP_32BIT for all architectures
Reviewed by:		alc, kib, markj
Differential revision:	https://reviews.freebsd.org/D41435
2023-08-14 20:20:20 +03:00
Dmitry Chagin
0ddd32b617 vm: MAP_32BIT_MAX_ADDR defined in sys/mman.h
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D41434
2023-08-14 20:18:30 +03:00
Alan Cox
37e5d49e1e vm: Fix address hints of 0 with MAP_32BIT
Also, rename min_addr to default_addr, which better reflects what it
represents.  The min_addr is not a minimum address in the same way that
max_addr is actually a maximum address that can be allocated.  For
example, a non-zero hint can be less than min_addr and be allocated.

Reported by:	dchagin
Reviewed by:	dchagin, kib, markj
Fixes:	d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision:	https://reviews.freebsd.org/D41397
2023-08-12 02:35:21 -05:00
Konstantin Belousov
9b65fa6940 linuxolator: implement Linux' PROT_GROWSDOWN
From the Linux man page for mprotect(2):
   PROT_GROWSDOWN
       Apply  the  protection  mode  down to the beginning of a mapping
       that grows downward (which should be a stack segment or a
       segment mapped with the MAP_GROWSDOWN flag set).

Reported by:	dchagin
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:14 +03:00
Konstantin Belousov
90049eabcf vm_map_protect(): add VM_MAP_PROTECT_GROWSDOWN flag
which requests to propagate lowest stack segment protection to the grow gap.
This seems to be required for Linux emulation.

Reported by:	dchagin
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:14 +03:00
Konstantin Belousov
b6037edbd1 vm_map_growstack(): restore stack gap data if gap entry was removed
and then restored.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
9d7ea6cff7 vm_map: do not allow to merge stack gap entries
At least, offset handling is wrong for them.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
55be6be12c vm_map_protect(): handle stack protection stored in the stack guard
mprotect(2) on the stack region needs to adjust guard stored protection,
so that e.g. enable executing on stack worked properly on stack growth.

Reported by:	dchagin
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
79169929f0 vm_map_protect(): move guard handling at the last phase into an empty dedicated helper
Restructure the first phase slightly, to facilitate further changes.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
aa928a5216 vm_map_growstack(): handle max protection for stacks
Do not assume that protection is same as max_protection.  Store both in
offset, packed in the same way as the prot syscall parameter.

Reviewed by:	alc, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
0fb6aae7f0 vm_map.c: add CONTAINS_BITS macro
Suggested by:	dougm
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
ba41b0de3e Add vm_map_insert1(9)
The function returns the newly created entry.
Use vm_map_insert1() in stack grow code to avoid gap entry re-lookup.

The comment update for vm_map_try_merge_entries() was suggested by dougm.

Suggested by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
3b44ee50be vm_map_insert(): update herald comment
Only a part of the object may be mapped.

Noted by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
9da33e8d10 Update comment describing struct vm_map
There is no list connecting all entries any more, and correspondingly no
order on the list entries.

Reviewed by:	dougm
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D41405
2023-08-10 09:01:26 +03:00
Doug Moore
e77f4e7f59 vm_phys: tune vm_phys_enqueue_contig loop
Rewrite the final loop in vm_phys_enqueue_contig as a new function,
vm_phys_enq_beg, to reduce amd64 code size.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D41289
2023-08-04 21:09:39 -05:00
Doug Moore
ccdb28275d vm_phys_enq_range: no alignment assert for npages==0
Do not assume that when vm_phys_enq_range is passed npages==0 that the
vm_page argument is valid in any way, much less that it has a
page-aligned address. Just don't look at it. Assert nothing about it.

Reported by:	karels
Differential Revision:	https://reviews.freebsd.org/D41317
2023-08-04 13:41:59 -05:00
Doug Moore
c9b06fa527 vm_phys_enqueue_contig: handle npages==0
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.

The amd64 object code shrinks by 128 bytes.

Reviewed by:	kib (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D41154
2023-08-03 09:19:48 -05:00
Doug Moore
b7370efade Revert "vm_phys_enqueue_contig: handle npages==0"
This reverts commit 1a7fcf6d51.

Peter Holm reported a problem, so I'm reverting now and looking for
the problem later.
2023-08-02 04:33:40 -05:00
Doug Moore
1a7fcf6d51 vm_phys_enqueue_contig: handle npages==0
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.

The amd64 object code shrinks by 80 bytes.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D41154
2023-08-01 22:12:00 -05:00
Mark Johnston
d0e4e53ebd vm_map: Add a macro to fetch a map entry's split boundary index
The resulting code is a bit more concise.  No functional change
intended.

Reviewed by:	alc, dougm, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D41249
2023-08-01 10:10:02 -04:00
Doug Moore
ac0572e660 radix_tree: compute slot from keybarr
The computation of keybarr(), the function that determines when a
search has failed at a non-leaf node, can be done in a way that
computes the 'slot' value when keybarr() fails, which is exactly when
slot() would next be invoked. Computing things this way saves space in
search loops.

This reduces the amd64 coding of the search loop in vm_radix_lookup
from 40 bytes to 28 bytes.

Reviewed by:	alc
Tested by:	pho (as part of a larger change)
Differential Revision:	https://reviews.freebsd.org/D41235
2023-07-30 15:12:06 -05:00
Doug Moore
38f5cb1bfb radix_tree: redefine the clev field
The clev field in the node struct is almost always multiplied by
WIDTH; occasionally, it is incremented and then multiplied by
WIDTH. Instructions can be saved by storing it always multiplied by
WIDTH.

For the computation of slot(), this just eliminates a
multiplication. For trimkey(), where the caller always adds one to
clev before passing it as an argument, this change has the caller, not
the caller, do that. Trimkey() handles it not by adding WIDTH to the
input parameter, but by shifting COUNT, and not 1. That produces the
same result, and it relieves keybarr of the need to test to avoid
shifting by more than 63 bits, since level is always <= 63.

This takes 3 instrutions and 14 bytes out of the basic lookup loop on
amd64.

Reviewed by:	kib
Tested by:	pho (as part of a larger change)
Differential Revision:	https://reviews.freebsd.org/D41226
2023-07-30 01:20:07 -05:00
Doug Moore
2d2bcba7ba Every path in a radix trie ends with a leaf or a NULL. By replacing
NULL (non-leaf) pointers with NULL leaves, there is a NULL test
removed from every iteration of an index-based search loop.

This speeds up radix trie searches by few percent. If there are any
radix tries that are not initialized with the init() function, but
instead depend on zeroing everything being proper initialization, this
will break those tries.

Reviewed by:	alc, kib
Tested by:	pho (as part of a larger change)
Differential Revision:	https://reviews.freebsd.org/D41171
2023-07-28 11:39:52 -05:00
Alan Cox
5ec2d94ade vm_mmap_object: Update the spelling of true/false
Since fitit is already a bool, use true/false instead of TRUE/FALSE.

MFC after:	2 weeks
2023-07-27 00:25:53 -05:00
Alan Cox
50d663b14b vm: Fix vm_map_find_min()
Fix the handling of address hints that are less than min_addr by
vm_map_find_min().

Reported by:	dchagin
Reviewed by:	kib
Fixes:	d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision:	https://reviews.freebsd.org/D41159
2023-07-26 00:24:50 -05:00
Konstantin Belousov
db6c7c7f8d vmspace_fork(): do not override offset for the guard entries
The offset field contains protection for the stack guards.

Reported by:	cy
Fixes:	21e45c30c3
MFC after:	1 week
2023-07-20 22:04:03 +03:00
Konstantin Belousov
21e45c30c3 mmap(MAP_STACK): on stack grow, use original protection
If mprotect(2) changed protection in the bottom of the currently grown
stack region, currently the changed protection would be used for the
stack grow on next fault.  This is arguably unexpected.

Store the original protection for the entry at mmap(2) time in the
offset member of the gap vm_map_entry, and use it for protection of the
grown stack region.

PR:	272585
Reported by:	John F. Carr <jfc@mit.edu>
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41089
2023-07-20 17:11:42 +03:00
Doug Moore
6f251ef228 radix_trie: simplify ge, le lookups
Replace the implementations of lookup_le and lookup_ge with ones
that do not use a stack or climb back up the tree, and instead
exploit the popmap field to quickly identify the place to resume
searching if the straightforward indexed search fails.

The code size of the original functions shrinks by a combined 160
bytes on amd64, and the cumulative cycle count per invocation of
the two functions together is reduced 20% in a buildworld test.

Reviewed by:	alc, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D40936
2023-07-19 09:43:31 -05:00
Doug Moore
3e04ae433f vm_radix_init: use initializer
Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D40971
2023-07-14 01:49:55 -05:00
Doug Moore
16e01c05c0 radix_trie: avoid code duplication in insert
Two cases in the insert routine are written differently, when
they're really doing the same thing. Writing that case only once
saves 208 bytes in the compiled vm_radix_insert code and reduces
instructions executed by about 2%.
Reviewed by:	alc
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D40807
2023-07-09 15:06:02 -05:00
Doug Moore
8df38859d0 radix_trie: replace node count with popmap
Replace the 'count' field in a trie node with a bitmap that
identifies non-NULL children. Drop the 'last' field, and use the
last bit set in the bitmap instead.  In lookup_le, lookup_ge,
remove, and reclaim_all, use the bitmap to find the
previous/next/only/every non-null child in constant time by
examining the bitmask instead of looping across array elements
and null-checking them one-by-one.

A buildworld test suggests that this reduces the cycle count on
those functions that eliminate some null-checks by 4.9%, 1.5%,
0.0% and 13.3%.
Reviewed by:	alc
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D40775
2023-07-07 11:09:36 -05:00
Konstantin Belousov
ef747607ea vm_fault: move FAULT_* return codes out of range for Mach errors
This way a possible clash between FAULT_* and KERN_* numbering is
avoided, and panics checks for fault_status confusion become more
efficient.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D40771
2023-06-28 00:03:14 +03:00
Doug Moore
da72505f9c radix_trie: pass fewer params to node_get
Let node_get calculate it's own owner value. Don't pass the count
parameter, since it's always 2. Save 16 bytes in insert(). Move,
without modifying, slot and trimkey to handle use-before-declaration
problem.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D40723
2023-06-27 12:21:11 -05:00