opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-05-19 16:35:42 -04:00

Author	SHA1	Message	Date
Konstantin Belousov	03aecce81c	tmpfs: dynamically register tmpfs pager (cherry picked from commit `28bc23ab92`)	2021-05-22 12:38:30 +03:00
Konstantin Belousov	2f5321c170	vm: Add KPI to dynamically register pagers (cherry picked from commit `b730fd30b7`)	2021-05-22 12:38:30 +03:00
Konstantin Belousov	bf9b8d2ae0	sys/vm: remove several other uses of OBJT_SWAP_TMPFS (cherry picked from commit `7079449b0b`)	2021-05-22 12:38:30 +03:00
Konstantin Belousov	324fbdb27a	vm_object_set_memattr(): handle all object types without listing them explicitly (cherry picked from commit `3e7a11ca21`)	2021-05-22 12:38:30 +03:00
Konstantin Belousov	4c4bb6da85	vm_object_kvme_type(): reimplement by embedding kvme_type into pagerops (cherry picked from commit `00a3fe968b`)	2021-05-22 12:38:30 +03:00
Konstantin Belousov	55b68c9ac1	Constify vm_pager-related virtual tables. (cherry picked from commit `d474440ab3`)	2021-05-22 12:38:29 +03:00
Konstantin Belousov	2daf5ac2e5	Add OBJT_SWAP_TMPFS pager (cherry picked from commit `4b8365d752`)	2021-05-22 12:38:29 +03:00
Konstantin Belousov	f871a71a82	pagertab: use designated initializers (cherry picked from commit `0d2dfc6fed`)	2021-05-22 12:38:29 +03:00
Konstantin Belousov	6ecea720f3	Style enum obj_type (cherry picked from commit `838adc533f`)	2021-05-22 12:38:29 +03:00
Konstantin Belousov	da0e85f9eb	Implement vm_object_vnode() using vm_pager_getvp() (cherry picked from commit `a7c198a24b`)	2021-05-22 12:38:29 +03:00
Konstantin Belousov	76674f6896	Add pgo_freespace method (cherry picked from commit `1390a5cbeb`)	2021-05-22 12:38:29 +03:00
Konstantin Belousov	9a311cf995	Add pgo_getvp method (cherry picked from commit `192112b74f`)	2021-05-22 12:38:28 +03:00
Konstantin Belousov	2ad6fea032	Add pgo_mightbedirty method (cherry picked from commit `c23c555bc1`)	2021-05-22 12:38:28 +03:00
Konstantin Belousov	f3b6b7de3c	vm_pager: add pgo_set_writeable_dirty method (cherry picked from commit `180bcaa46c`)	2021-05-22 12:38:28 +03:00
Konstantin Belousov	12e1d859a6	vm_pager: style some wrappers (cherry picked from commit `ee4211bca6`)	2021-05-22 12:38:28 +03:00
Konstantin Belousov	951abff52f	swappagerops: slightly more style-compliant formatting (cherry picked from commit `a0850dd057`)	2021-05-22 12:38:28 +03:00
Mark Johnston	cf60931c32	fork: Suspend other threads if both RFPROC and RFMEM are not set Otherwise, a multithreaded parent process may trigger races in vm_forkproc() if one thread calls rfork() with RFMEM set and another calls rfork() without RFMEM. Also simplify vm_forkproc() a bit, vmspace_unshare() already checks to see if the address space is shared. Reported by: syzbot+0aa7c2bec74c4066c36f@syzkaller.appspotmail.com Reported by: syzbot+ea84cb06937afeae609d@syzkaller.appspotmail.com Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30220 (cherry picked from commit `9246b3090c`)	2021-05-20 09:16:47 -04:00
Mark Johnston	4a6c5c8f59	swap_pager: Zero swap info before exporting to userspace Otherwise padding bytes are leaked. Reported by: KMSAN Sponsored by: The FreeBSD Foundation (cherry picked from commit `06d1fd9f42`)	2021-05-19 09:32:18 -04:00
Alexander Motin	555baef969	Improve UMA cache reclamation. When estimating working set size, measure only allocation batches, not free batches. Allocation and free patterns can be very different. For example, ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call, but it does not mean it will request the same amount back that fast too, in fact it won't. Update working set size on every reclamation call, shrinking caches faster under pressure. Lack of this caused repeating vm_lowmem events squeezing more and more memory out of real consumers only to make it stuck in UMA caches. I saw ZFS drop ARC size in half before previous algorithm after periodic WSS update decided to reclaim UMA caches. Introduce voluntary reclamation of UMA caches not used for a long time. For each zdom track longterm minimal cache size watermark, freeing some unused items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed memory can get better use by other consumers. For example, ZFS won't grow its ARC unless it see free memory, since it does not know it is not really used. And even if memory is not really needed, periodic free during inactivity periods should reduce its fragmentation. Reviewed by: markj, jeff (previous version) MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29790 (cherry picked from commit `2760658b21`)	2021-05-15 22:10:48 -04:00
Mark Johnston	0758fa13b4	uma: Introduce per-domain reclamation functions Make it possible to reclaim items from a specific NUMA domain. - Add uma_zone_reclaim_domain() and uma_reclaim_domain(). - Permit parallel reclamations. Use a counter instead of a flag to synchronize with zone_dtor(). - Use the zone lock to protect cache_shrink() now that parallel reclaims can happen. - Add a sysctl that can be used to trigger reclamation from a specific domain. Currently the new KPIs are unused, so there should be no functional change. Reviewed by: mav Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29685 (cherry picked from commit `aabe13f145`)	2021-04-28 10:00:52 -04:00
Mark Johnston	6eddb6822c	uma: Split bucket_cache_drain() to permit per-domain reclamation Note that the per-domain variant does not shrink the target bucket size. No functional change intended. Sponsored by: The FreeBSD Foundation (cherry picked from commit `54f421f9e8`)	2021-04-28 10:00:41 -04:00
Konstantin Belousov	44a0bdad29	sysctl vm.objects: report backing object and swap use (cherry picked from commit `ecfbddf0cd`)	2021-04-23 14:14:11 +03:00
Konstantin Belousov	c28f5f9b3e	Add sysctl debug.uma_reclaim (cherry picked from commit `89619b747b`)	2021-04-11 03:35:16 +03:00
Konstantin Belousov	b007a05b5a	Style (cherry picked from commit `51a7be5f60`)	2021-04-07 06:32:40 +03:00
Mark Johnston	2e08308d62	vm_fault: Shoot down multiply mapped COW source page mappings Reviewed by: kib, rlibby Discussed with: alc Approved by: so Security: CVE-2021-29626 Security: FreeBSD-SA-21:08.vm (cherry picked from commit `982693bb72`)	2021-04-06 14:50:46 -04:00
Konstantin Belousov	b2ebf64aae	vm_fault: handle KERN_PROTECTION_FAILURE (cherry picked from commit `c7b913aa47`)	2021-04-03 03:39:06 +03:00
Kristof Provost	acc600ea32	uma: allow uma_zfree_pcu(..., NULL) We already allow free(NULL) and uma_zfree(..., NULL). Make uma_zfree_pcpu(..., NULL) work as well. This also means that counter_u64_free(NULL) will work. These make cleanup code simpler. MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29189 (cherry picked from commit `b8f7267d49`)	2021-03-19 23:40:07 +01:00
Mark Johnston	03984bdfa0	vm: Round up npages and alignment for contig reclamation When searching for runs to reclaim, we need to ensure that the entire run will be added to the buddy allocator as a single unit. Otherwise, it will not be visible to vm_phys_alloc_contig() as it is currently implemented. This is a problem for allocation requests that are not a power of 2 in size, as with 9KB jumbo mbuf clusters. Reported by: alc Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28924 (cherry picked from commit `0401989282`)	2021-03-16 11:14:09 -04:00
Mark Johnston	cec3990d34	vm_reserv: Fix list locking in vm_reserv_reclaim_contig() The per-domain partpop queue is locked by the combination of the per-domain lock and individual reservation mutexes. vm_reserv_reclaim_contig() scans the queue looking for partially populated reservations that can be reclaimed in order to satisfy the caller's allocation. During the scan, we drop the per-domain lock. At this point, the rvn pointer may be invalidated. Take care to load rvn after re-acquiring the per-domain lock. While here, simplify the condition used to check whether a reservation was dequeued while the per-domain lock was dropped. Reviewed by: alc, kib Reported by: gallatin Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29203 (cherry picked from commit `968079f253`)	2021-03-13 20:09:29 -05:00
Mark Johnston	ba0d063cd6	uma: Update the comment above startup_alloc() to reflect reality The scheme used for early slab allocations changed in commit `a81c400e75`. Reported by: alc Reviewed by: alc (cherry picked from commit `537f92cd35`)	2021-02-28 19:31:58 -05:00
Mark Johnston	0486986ad8	vm_kern: Avoid sign extension in the KVA_QUANTUM definition Otherwise, on a powerpc64 NUMA system with hashed page tables, the first-level superpage reservation size is large enough that the value of the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is negative and gets sign-extended when passed to vmem_set_import(). This results in a boot-time hang on such platforms. Reported by: bdragon (cherry picked from commit `23e875fd97`)	2021-02-25 08:56:54 -05:00
Mark Johnston	a73aaaeb57	vm: Honour the "noreuse" flag to vm_page_unwire_managed() This flag indicates that the page should be enqueued near the head of the inactive queue, skipping the LRU queue. It is used when unwiring pages from the buffer cache following direct I/O or after I/O when POSIX_FADV_NOREUSE or _DONTNEED advice was specified, or when sendfile(SF_NOCACHE) completes. For the direct I/O and sendfile cases we only enqueue the page if we decide not to free it, typically because it's mapped. Pass "noreuse" through to vm_page_release_toq() so that we actually honour the desired LRU policy for these scenarios. Reported by: bdrewery Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D28555 (cherry picked from commit `5c18744ea9`)	2021-02-12 19:25:05 -05:00
Konstantin Belousov	420d4be3e4	vm_map_protect(): remove not needed recalculations of new_prot, new_maxprot Requested by: alc Sponsored by: The FreeBSD Foundation	2021-01-14 10:02:43 +02:00
Konstantin Belousov	0659df6fad	vm_map_protect: allow to set prot and max_prot in one go. This prevents a situation where other thread modifies map entries permissions between setting max_prot, then relocking, then setting prot, confusing the operation outcome. E.g. you can get an error that is not possible if operation is performed atomic. Also enable setting rwx for max_prot even if map does not allow to set effective rwx protection. Reviewed by: brooks, markj (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28117	2021-01-13 01:35:22 +02:00
Konstantin Belousov	9402bb44f1	vmspace_fork: preserve wx settings in the child vm map after fork Noted by: markj Sponsored by: The FreeBSD Foundation	2021-01-12 08:09:59 +02:00
Konstantin Belousov	2e1c94aa1f	Implement enforcing write XOR execute mapping policy. It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE \| PROT_EXEC are never specified together, if vm_map has MAP_WX flag set. FreeBSD control flag allows specific binary to request WX exempt, and there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/ disable globally. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050	2021-01-12 01:15:43 +02:00
Mark Johnston	663de81f85	uma: Avoid unmapping direct-mapped slabs startup_alloc() uses pmap_map() to map slabs used for bootstrapping the VM. pmap_map() may ignore the hint address and simply return a range from the direct map. In this case we must not unmap the range in startup_free(). UMA uses bootstart and bootmem to track the range of KVA into which slabs are mapped if the direct map is not used. Unmap a startup slab only if it was mapped into that range. Reported by: alc Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27885	2021-01-03 11:50:31 -05:00
Ryan Libby	942951ba46	uma dbg: catch more corruption with atomics Use atomic testandset and testandclear to catch concurrent double free, and to reduce the number of atomic operations. Submitted by: jeff Reviewed by: cem, kib, markj (all previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22703	2020-12-31 13:02:45 -08:00
Mark Johnston	81846def34	vm: Fix some bugs in the page busying code In vm_page_busy_acquire(), load the object pointer using atomic_load_ptr() as we do elsewhere. Per the comment, the object identity must be consistent across sleeps. In vm_page_grab_sleep(), pass the correct pindex to _vm_page_busy_sleep(). The pindex is used to re-check the page's identity before going to sleep. In particular, vm_page_grab_sleep() is used in unlocked grab, so the object lock is not necessarily held when verifying the page's identity, and the pindex may change if the page is moved, or freed and re-allocated. I believe this can result in spurious VM_PAGER_FAILs from vm_page_grab_valid_unlocked() or early termination of vm_page_grab_pages_unlocked(). In vm_page_grab_pages(), pass the correct pindex to vm_page_grab_sleep(). Otherwise I believe vm_page_grab_pages() will effectively spin when attempting to busy a busy page after the first index in the range. Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27607	2020-12-27 17:01:44 -05:00
Mark Johnston	d2f1c44bc9	uma: Remove the MINBUCKET flag from the flag name list This should have been done in r368399 / commit `f8b6c51538`. Reported by: rlibby Sponsored by: The FreeBSD Foundation	2020-12-27 17:01:33 -05:00
Bryan Drewery	5fee468e83	Revert r368523 which fixed contig allocs waiting forever. This needs to account for empty NUMA domains or domains which do not satisfy the requested range. Discussed with: markj	2020-12-15 19:38:16 +00:00
Bryan Drewery	bbfec1633b	contig allocs: Don't retry forever on M_WAITOK. This restores behavior from before domain iterators were added in r327895 and r327896. The vm_domainset_iter_policy() will do a vm_wait_doms() and then restart its iterator when M_WAITOK is set. It will also force the containing loop to have M_NOWAIT. So we get an unbounded retry loop rather than the intended bounded retries that kmem_alloc_contig_pages() already handles. This also restores M_WAITOK to the vmem_alloc() call in kmem_alloc_attr_domain() and kmem_alloc_contig_domain(). Reviewed by: markj, kib MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D27507	2020-12-10 20:44:29 +00:00
Mark Johnston	e574d407ae	uma: Make uma_zone_set_maxcache() work better with small limits The old implementation chose the largest bucket zone such that if the per-CPU caches are fully populated, the total number of items cached is no larger than the specified limit. If no such zone existed, UMA would not do any caching. We can now use uz_bucket_size_max to set a precise limit on the number of items in a zone's bucket, so the total size of per-CPU caches can be bounded more easily. Implement a new policy in uma_zone_set_maxcache(): choose a bucket size such that up to half of the limit can be cached in per-CPU caches, with the rest going to the full bucket cache. This fixes a problem with the kstack_cache zone: the limit of 4 * mp_ncpus items meant that the zone would not do any caching, defeating the whole purpose of the zone. That's because the smallest bucket size holds up to 2 items and we may cache up to 3 full buckets per CPU, and 2 * 3 * mp_ncpus > 4 * mp_ncpus. Reported by: mjg Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27168	2020-12-06 22:45:50 +00:00
Mark Johnston	f8b6c51538	uma: Enforce the use of uz_bucket_size_max in the free path uz_bucket_size_max is the maximum permitted bucket size. When filling a new bucket to satisfy uma_zalloc(), the bucket is populated with at most uz_bucket_size_max items. The maximum number of entries in the bucket may be larger. When freeing items, however, we will fill per-CPPU buckets up to their maximum number of entries, potentially exceeding uz_bucket_size_max. This makes it difficult to precisely limit the number of items that may be cached in a zone. For example, if one wants to limit buckets to 1 entry for a particular zone, that's not possible since the smallest bucket holds up to 2 entries. Try to solve the problem by using uz_bucket_size_max to limit the number of entries in a bucket. Note that the ub_entries field is initialized upon every bucket allocation. Most zones are not affected since they do not impose any specific limit on the maximum bucket size. While here, remove the UMA_ZONE_MINBUCKET flag. It was unused and we now have uma_zone_set_maxcache() to control the zone's cache size more precisely. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27167	2020-12-06 22:45:39 +00:00
Mark Johnston	8a6776ca0f	uma: Use atomic load for uz_sleepers This field is updated locklessly. Sponsored by: The FreeBSD Foundation	2020-12-06 22:45:22 +00:00
Mark Johnston	991f23ef20	uma: Avoid allocating buckets with the cross-domain lock held Allocation of a bucket can trigger a cross-domain free in the bucket zone, e.g., if the per-CPU alloc bucket is empty, we free it and get migrated to a remote domain. This can lead to deadlocks since a bucket zone may allocate buckets from itself or a pair of bucket zones could be allocating from each other. Fix the problem by dropping the cross-domain lock before allocating a new bucket and handling refill races. Use a list of empty buckets to ensure that we can make forward progress. Reported by: imp, mjg (witness(9) warnings) Discussed with: jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27341	2020-11-30 16:18:33 +00:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Mark Johnston	1fea4b25c9	Wrap a long line in vm_pqbatch_process_page()	2020-11-19 15:41:42 +00:00
Mark Johnston	9e3e737608	Micro-optimize vm_page_pqbatch_submit() Avoid calling vm_page_domain() twice. Discussed with: alc (in D27207)	2020-11-19 15:40:58 +00:00
Mark Johnston	431fb8abd7	vm_phys: Try to clean up NUMA KPIs It can useful for code outside the VM system to look up the NUMA domain of a page backing a virtual or physical address, specifically when creating NUMA-aware data structures. We have _vm_phys_domain() for this, but the leading underscore implies that it's an internal function, and vm_phys.h has dependencies on a number of other headers. Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to vm_phys_domain(). Make the latter an inline function. Add _vm_phys.h and define struct vm_phys_seg there so that it's easier to use in other headers. Include it from vm_page.h so that vm_page_domain() can be defined there. Include machine/vmparam.h from _vm_phys.h since it depends directly on some constants defined there. Reviewed by: alc Reviewed by: dougm, kib (earlier versions) Differential Revision: https://reviews.freebsd.org/D27207	2020-11-19 03:59:21 +00:00

1 2 3 4 5 ...

4529 commits