Commit graph

151953 commits

Author SHA1 Message Date
Andrew Turner
7a345763f9 arm64: Expand the use of Armv8.1-A atomics
When targeting Armv8.1 we can assume FEAT_LSE is available and can use
the atomic instructions this provides without needing to check for
support first.

Reviewed by:	imp, markj, emaste
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46159
2024-08-19 10:53:12 +00:00
Andrew Turner
87940d2b33 buf_ring: Add an Arm copyright
I've change enough of this file to add Arm as a copyright holder.
Add it after the "All rights reserved" line as that's not needed.

Reviewed by:	imp
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46157
2024-08-19 10:53:12 +00:00
Andrew Turner
fe2445f47d buf_ring: Ensure correct ordering of loads
When enqueueing on an architecture with a weak memory model ensure
loading br->br_prod_head and br->br_cons_tail are ordered correctly.

If br_cons_tail is loaded first then other threads may perform a
dequeue and enqueue before br_prod_head is loaded. This will mean the
tail is one less than it should be and the code under the
prod_next == cons_tail check could incorrectly be skipped.

buf_ring_dequeue_mc has the same issue with br->br_prod_tail and
br->br_cons_head so needs the same fix.

Reported by:	Ali Saidi <alisaidi@amazon.com>
Co-developed by: Ali Saidi <alisaidi@amazon.com>
Reviewed by:	imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46155
2024-08-19 10:53:11 +00:00
Andrew Turner
947754af55 buf_ring: Use atomic operations with br_prod_tail
As with br_cons_tail use an atomic load acquire to read br_prod_tail
in buf_ring_dequeue_mc and buf_ring_peek*.

On dequeue we need to ensure we don't read the entry from the buf_ring
until it is available and prod_tail has updated. There is already an
appropriate store in the enqueue path and an appropriate load in the
single consumer dequeue, we just need one in the other functions that
read from the buf_ring.

Reviewed by:	imp, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46154
2024-08-19 10:53:11 +00:00
Andrew Turner
7eb0fffc77 buf_ring: Remove old arm-only dequeue code
In the single consumer dequeue the consumer thread controls
br_cons_head. As such no ordering between this and other data are
required.

Reviewed by:	alc, imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46153
2024-08-19 10:53:11 +00:00
Andrew Turner
44e1cfca41 buf_ring: Use atomic operations with br_cons_tail
Use an atomic operation with a memory barrier loading br_cons_tail
from the producer thread and storing to it in the consumer thread.

On dequeue we need to read the pointer value from the buf_ring before
moving the consumer tail as that indicates the entry is available to be
used. The store release atomic operation guarantees this.

In the enqueueing thread we then need to use a load acquire atomic
operation to ensure writing to this entry can only happen after the
tail has been read and checked.

Reported by:	Ali Saidi <alisaidi@amazon.com>
Co-developed by: Ali Saidi <alisaidi@amazon.com>
Reviewed by:	markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46152
2024-08-19 10:53:11 +00:00
Andrew Turner
3cc603909e buf_ring: Keep the full head and tail values
If a thread reads the head but then sleeps for long enough that
another thread fills the ring and leaves the new head with the
expected value then the cmpset can pass when it should have failed.

To work around this keep the full head and tail value and use the
upper bits as a generation count.

Reviewed by:	kib
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46151
2024-08-19 10:53:11 +00:00
Andrew Turner
17a597bc13 buf_ring: Consistently use atomic_*_32
We are operating on uint32_t values, use uint32_t atomic functions.

Reviewed by:	alc, imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46150
2024-08-19 10:04:25 +01:00
Andrew Turner
d3d34d56be buf_ring: Support DEBUG_BUFRING in userspace
The only part of DEBUG_BUFRING we don't support in userspace is the
mutex checks. Add _KERNEL checks around these so we can enable the
extra debugging.

Reviewed by:	alc, imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46149
2024-08-19 10:04:25 +01:00
Andrew Turner
5048308bdb buf_ring: Remove PREFETCH_DEFINED
I'm not able to find anything in the tree that ever defined it. Remove
as it's unused so is untested.

Reviewed by:	alc, imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46148
2024-08-19 10:04:24 +01:00
Eugene Grosbein
8132e95909 libalias: fix subtle racy problem in outside-inside forwarding
sys/netinet/libalias/alias_db.c has internal static function UseLink()
that passes a link to CleanupLink() to verify if the link has expired.
If so, UseLink() may return NULL.

_FindLinkIn()'s usage of UseLink() is not quite correct.

Assume there is "redirect_port udp" configured to forward incoming
traffic for specific port to some internal address.
Such a rule creates partially specified permanent link.

After first such packet libalias creates new fully specifiled
temporary LINK_UDP with default timeout 60 seconds.
Also, in case of low traffic libalias may assign "timestamp"
for this new temporary link way in the past because
LibAliasTime is updated seldom and can keep old value
for tens of seconds, and it will be used for the temporary link.

It may happen that next incoming packet for redirected port
passed to _FindLinkIn() results in a call to UseLink()
that returns NULL due to detected expiration.
Immediate return of NULL results in broken translation:
either a packet is dropped (deny_incoming mode) or delivered to
original destination address instead of internal one.

Fix it with additional check for NULL to proceed with a search
for original partially specified link. In case of UDP,
it also recreates temporary fully specified link
with a call to ReLink().

Practical examples are "redirect_port udp" rules for unidirectional
SYSLOG protocol (port 514) or some low volume VPN encapsulated in UDP.

Thanks to Peter Much for initial analysis and first version of a patch.

Reported by:	Peter Much <pmc@citylink.dinoex.sub.org>
PR:		269770
MFC after:	1 week
2024-08-19 10:34:37 +07:00
Navdeep Parhar
0a9d1da6e6 cxgbe(4): Stop work request queues in a reliable manner.
Clear the EQ_HW_ALLOCATED flag with the wrq lock held and discard all
work requests, pending or new, when it's not set.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2024-08-17 11:23:32 -07:00
Navdeep Parhar
b5332809c6 cxgbe/iw_cxgbe: Fix typo in assertion.
eanbled -> enabled

MFC after:	3 days
2024-08-17 10:38:36 -07:00
Rick Macklem
10d5b43424 nfsproto.h: Define the new mode_umask attribute
RFC8275 defines a new attribute as an extension to NFSv4.2
called MODE_UMASK.  This patch adds the attribute number
to nfsproto.h.

Future patches will add optional support for the attribute.
This patch does not cause any semantics change.

MFC after:	2 weeks
2024-08-16 17:40:52 -07:00
Kajetan Staszkiewicz
788f194f60 pf: 'sticky-address' requires 'keep state'
When route_to() processes a packet without state, pf_map_addr() is called for
each packet. Pf_map_addr() will search for a source node and will find none
since those are created only in pf_create_state(). Thus sticky address,
even though requested in rule definition, will never work.

Raise an error when a stateless filter rule uses sticky address to avoid
confusion and to keep ruleset limitations in sync with what the pf code
really does.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D46310
2024-08-16 11:43:00 +02:00
Warner Losh
3d89acf590 nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out
the admin failures and flag them as such so that we can still send admin
commands on half-failed drives.

Fixes: 9229b3105d (nvme: Fail passthrough commands right away in failed state)
Sponsored by: Netflix
2024-08-15 21:31:20 -06:00
Warner Losh
ce7fac64ba Revert "nvme: Separate total failures from I/O failures"
All kinds of crazy stuff was mixed into this commit. Revert
it and do it again.

This reverts commit d5507f9e43.

Sponsored by:		Netflix
2024-08-15 21:29:53 -06:00
Warner Losh
d5507f9e43 nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out
the admin failures and flag them as such so that we can still send admin
commands on half-failed drives.

Fixes: 9229b3105d (nvme: Fail passthrough commands right away in failed state)
Sponsored by: Netflix
2024-08-15 20:22:18 -06:00
Kevin Lo
8b21c469db ng_ubt: Add blacklist entries for MediaTek MT7925
This controller requires firmware patch downloading to operate,
block ng_ubt attachment unless operational firmware is loaded.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D46302
2024-08-16 10:03:19 +08:00
Jessica Clarke
3cded05922 tmpfs: Fix OOB write when setting vfs.tmpfs.memory_percent
tmpfs_mem_percent is an int not a long, so on a 64-bit system this
writes 4 bytes past the end of the variable. The read above is correct,
so this was likely a copy paste error from sysctl_mem_reserved.

Found by:	CHERI
Fixes:		636592343c ("tmpfs: increase memory reserve to a percent of available memory + swap")
2024-08-15 20:33:22 +01:00
Pierre Pronchery
ef9fc9609a sys: Mark ACL conversion routines as __result_use_check
Both acl_copy_oldacl_into_acl() and acl_copy_acl_into_oldacl() may fail
in some circumstances (e.g., acl.acl_cnt exceeding the capacity of
OLDACL_MAX_ENTRIES).  This change marks both routines with
__result_use_check, enforcing check for errors by the caller.

Suggested by:	markj
Reviewed by:	markj, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46254
2024-08-15 15:04:29 -04:00
Mark Johnston
bef079254f arm64: Clamp segment sizes properly in bounce_bus_dmamap_load_buffer()
Commit 099b595154 ("Improve loading of multipage aligned buffers.")
modified bounce_bus_dmamap_load_buffer() with the assumption that busdma
memory allocations are physically contiguous, which is not always true:
bounce_bus_dmamem_alloc() will allocate memory with
kmem_alloc_attr_domainset() in some cases, and this function is not
guaranteed to return contiguous memory.

The damage seems to have been mitigated for most consumers by clamping
the segment size to maxsegsz, but this was removed in commit
a77e1f0f81 ("busdma: better handling of small segment bouncing"); in
practice, it seems busdma memory is often allocated with maxsegsz ==
PAGE_SIZE.  In particular, after commit a77e1f0f81 I see occasional
random kernel memory corruption when benchmarking TCP through mlx5
interfaces.

Fix the problem by using separate flags for contiguous and
non-contiguous busdma memory allocations, and using that to decide
whether to clamp.

Fixes:	099b595154 ("Improve loading of multipage aligned buffers.")
Fixes:	a77e1f0f81 ("busdma: better handling of small segment bouncing")
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
Differential Revision:	https://reviews.freebsd.org/D46238
2024-08-15 14:19:22 +00:00
Cheng Cui
8cc528c682
tcp cc: clean up some un-used cc_var flags
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D46299
2024-08-15 09:33:04 -04:00
Martin Matuska
29dc934914 zfs: merge openzfs/zfs@d2ccc2155
Notable upstream pull request merges:
 #16431 244ea5c48 Add missing kstats to dataset kstats

Obtained from:	OpenZFS
OpenZFS commit: d2ccc21552
2024-08-15 13:30:31 +02:00
Kajetan Staszkiewicz
1fc0dac54c pf: Convert struct pf_addr_wrap before sending it over netlink
The struct pf_addr_wrap when used inside of kernel operates on pointers to
tables or interfaces. When reading a ruleset the struct must contain
counters calculated from the aforementioned tables and interfaces. Both the
pointers and the resulting counters are stored in an union and thus can't be
present in the struct at the same time.

The original ioctl code handles this by making a copy of struct pf_addr_wrap
for pool addresses, accessing the table or interface structures by their
pointers, calculating the counter values and storing them in place of those
pointers in the copy. Then this copy is sent over ioctl.

Use this mechanism for netlink too. Create a copy of src/dst addresses. Use
the existing function pf_addr_copyout() to convert pointers to counters both
for src/dst and pool addresses.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D46291
2024-08-15 11:11:59 +02:00
Kajetan Staszkiewicz
6c479edc61 pf: Fix indentation in struct pf_ksrc_node
This is a purely cosmetic change to simplify future diffs.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D46298
2024-08-15 09:36:18 +02:00
Igor Ostapenko
8aaffd78c0 Add dummymbuf module for testing purposes
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D45928
2024-08-15 09:28:13 +02:00
Paul Dagnelie
bd4f2023bb Add missing kstats to dataset kstats
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #16431
(cherry picked from commit 244ea5c488)
2024-08-14 15:42:00 -07:00
Gleb Smirnoff
b458ddf27f zfs: fix build without MAC 2024-08-14 09:06:31 -07:00
Kristof Provost
89f6723288 pf: invert direction for inner icmp state lookups
(e.g. traceroute with icmp)
ok henning, jsing

Also extend the test case to cover this scenario.

PR:		280701
Obtained from:	OpenBSD
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-08-14 14:15:07 +02:00
Warner Losh
8c44df321c nvme: Add a clarifying comment
While it is easy enough to bounce over to nvme.c from nvme_ctrlr.c to
find this out, I've had to do that several times, so a little bit of
context is quite helpful.

Sponsored by:		Netflix
2024-08-13 16:46:41 -06:00
Warner Losh
d40fc35f93 nvme: Make is_initialized a bool
is_initialized is used as a bool everywhere, and we never do any atomics
with it, so make it really a bool.

Sponsored by:		Netflix
2024-08-13 16:46:41 -06:00
Ed Maste
3192fc3023 x86: Enable Intel DMAR by default
APIC ID 255 and above require x2APIC and DMAR interrupt remapping.
FreeBSD is starting to be tested on high core count Intel systems that
meet this criteria.

Reviewed by:	kib, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42435
2024-08-13 16:02:45 -04:00
Ed Maste
2777a32588 iommu: disable dma by default
APIC ID 255 and above require x2APIC and DMAR interrupt remapping.
FreeBSD is starting to be tested on high core count Intel systems that
meet this criteria.  We're going to enable DMAR by default to support
this, so default hw.iommu.dma to 0 to avoid a significant performance
regression.

Reviewed by:	kib, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D42435
2024-08-13 16:00:47 -04:00
SHENG-YI HONG
d3b05d0ea1
Add smbus and i2c device IDs for Meteor Lake
Reviewed by:	emaste, Daniel Schaefer <dhs@frame.work>
MFC after:	3 days
Sponsored by:	Framework Computer Inc
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D46249
2024-08-13 20:52:19 +08:00
Kristof Provost
2da98eef1f pf: fix icmp-in-icmp state lookup
In 534ee17e6 pf state checking for ICMP(v6) was made stricter. This change
failed to correctly set the pf_pdesc for ICMP-in-ICMP lookups, resulting in ICMP
error packets potentially being dropped incorrectly.
Specially, it copied the ICMP header into a separate variable, not into the
pf_pdesc.

Populate the required pf_pdesc fields for the embedded ICMP packet's state lookup.

PR:		280701
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-08-13 13:23:17 +02:00
Kristof Provost
82e021443a pf: cope with SCTP port re-use
Some SCTP implementations will abort connections and then later re-use the same
port numbers (i.e. both src and dst) for a new connection, before pf has fully
purged the old connection.

Apply the same hack we already have for similarly misbehaving TCP
implementations and forcibly remove the old state so we can create a new one.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-08-13 13:16:12 +02:00
Mark Johnston
8805377dad dtraceall: Make dtaudit a dependency
Reported by:	tsoome
Reviewed by:	tsoome
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D46274
2024-08-12 23:06:45 +00:00
Mark Johnston
27211b7998 mlx5: Remove a less than helpful debug print
Reviewed by:	khng
Fixes:	e23731db48 ("mlx5en: add IPSEC_OFFLOAD support")
Differential Revision:	https://reviews.freebsd.org/D46273
2024-08-12 23:06:01 +00:00
Mark Johnston
fc4365853f socket: Fix handling of listening sockets in sotoxsocket()
A lock needs to be held to ensure that the socket does not become a
listening socket while sotoxsocket() is loading fields from the socket
buffers, as the memory backing the socket buffers is repurposed when
transitioning to a listening socket.

MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
2024-08-12 22:53:26 +00:00
Mark Johnston
07f2ed5ce8 socket: Make the sopt_rights field a pointer to const
No functional change intended.

MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
2024-08-12 22:53:26 +00:00
Mark Johnston
0a8e5aaa97 socket: Add macros to assert that the caller holds a socket I/O lock
Remove some unused macros while here.  No functional change intended.

MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
2024-08-12 22:53:26 +00:00
Doug Moore
b70247df0b axgbe: use bit_foreach
bitstring.h includes a definition of bit_foreach, for iterating over
the set bits of a bitstring. axgbe implements its own version of this
for bitstrings. Drop it, and use the bitstring method.

Reviewed by:	des
Differential Revision:	https://reviews.freebsd.org/D46037
2024-08-12 16:04:32 -05:00
Li-Wen Hsu
6ea4d95f6c
Move support of Realtek 8156/8156B from cdce(4) to ure(4)
Reviewed by:	kevlo, imp, hrs
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45088
2024-08-12 14:56:28 +08:00
Fernando Apesteguía
5ab6ed93cd faccessat(2): Honor AT_SYMLINK_NOFOLLOW
Make the system call honor `AT_SYMLINK_NOFOLLOW`.

Also enable this from `linux_faccessat2` where the issue arised the first time.
Update manual pages accordingly.

PR:			275295
Reported by:		kenrap@kennethraplee.com
Approved by:		kib@
Differential Revision:	https://reviews.freebsd.org/D46267
2024-08-11 17:49:06 +02:00
Mark Johnston
aea9dba46b turnstile: Mention the lock name when panicking due to a sleeping thread
This will hopefully make it a bit easier to track down the cause of such
panics.

MFC after:	2 weeks
2024-08-10 15:42:35 +00:00
Martin Matuska
ce4dcb97ca zfs: merge openzfs/zfs@9c56b8ec7
Notable upstream pull request merges:
 #15817 5536c0dee Sync AUX label during pool import
 #15889 c7ada64bb ddt: dedup table quota enforcement
 #15890 62e7d3c89 ddt: add support for prefetching tables into the ARC
 #15894 e26b3771e spa_preferred_class: pass the entire zio
 #15894 d54d0fff3 dnode: allow storage class to be overridden by object type
 #16197 55427add3 Several improvements to ARC shrinking
 #16217 -multiple JSON output for various zfs and zpool subcommands
 #16248 24e6585e7 libzfs.h: Set ZFS_MAXPROPLEN and ZPOOL_MAXPROPLEN
                  to ZAP_MAXVALUELEN
 #16264 9dfc5c4a0 Fix long_free_dirty accounting for small files
 #16268 ed0db1cc8 Make txg_wait_synced conditional in zfsvfs_teardown,
                  for FreeBSD
 #16288 d60debbf5 Fix sa_add_projid to lookup and update SA_ZPL_DXATTR
 #16308 ec580bc52 zfs: add bounds checking to zil_parse
 #16310 c21dc56ea Fix zdb_dump_block for little endian
 #16315 7ddc1f737 zil: add stats for commit failure/fallback
 #16326 b0bf14cdb abd: lift ABD zero scan from zio_compress_data()
                  to abd_cmp_zero()
 #16337 c8184d714 Block cloning conditionally destroy ARC buffer
 #16338 dbe07928b Add support for multiple lines to the sharenfs property
                  for FreeBSD
 #16374 1a3e32e6a Cleanup DB_DNODE() macros usage
 #16374 ed87d456e Skip dnode handles use when not needed
 #16346 fb6d8cf22 Add some missing vdev properties
 #16364 670147be5 zvol: ensure device minors are properly cleaned up
 #16382 dea8fabf7 FreeBSD: Fix RLIMIT_FSIZE handling for block cloning
 #16387 aef452f10 Improve zfs_blkptr_verify()
 #16395 cbcb52243 Fix the names of some FreeBSD sysctls in
                  include/tunables.cfg
 #16401 5b9f3b766 Soften pruning threshold on not evictable metadata
 #16404 cdd53fea1 FreeBSD: Add missing memory reclamation accounting
 #16404 1fdcb653b Once more refactor arc_summary output
 #16419 1f5bf91a8 Fix memory corruption during parallel zpool import
                  with -o cachefile
 #16426 cf6e8b218 zstream: remove duplicate highbit64 definition

Obtained from:	OpenZFS
OpenZFS commit:	9c56b8ec78
2024-08-10 11:43:43 +02:00
Stefan Eßer
45d4e82bf6 msdosfs: fix cluster limit when mounting FAT-16 file systems
The maximum cluster number was calculated based on the number of data
cluters that fit in the givem partition size and the size of the FAT
area. This limit did not take into account that the highest 10 cluster
numbers are reserved and must not be used for files.

PR:		280347
MFC after:	3 days
Reported by:	pho@FreeBSD.org
2024-08-09 19:26:27 +02:00
Pierre Pronchery
6ee6c7b146 acl_copyin: avoid returning uninitialized memory
acl_copyin did not validate the return value of acl_copy_oldacl_into_acl
which could lead to uninitialized acl structure memory reads.

Reported by:    Synacktiv
Reviewed by:	markj, emaste
Sponsored by:   The Alpha-Omega Project
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46125
2024-08-09 10:40:59 -04:00
Michael Tuexen
9b569353e0 tcp: initialize V_ts_offset_secret for all vnets
Initialize V_ts_offset_secret for each vnet, not only for the
default vnet, since it is vnet specific.

Reviewed by:		Peter Lei
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46246
2024-08-09 16:12:22 +02:00