in_pcblisten() moves an inpcb from the per-group list into the array, at
which point it becomes visible to inpcb lookups in the datapath. It
assumes that there is space in the array for this, but that's not
guaranteed, since in_pcbinslbgrouphash() doesn't reserve space in the
array if the inpcb isn't associated with a listening socket.
We could resize the array in in_pcblisten(), but that would introduce a
failure case where there currently is none. Instead, keep track of the
number of pending inpcbs as well, and modify in_pcbinslbgrouphash() to
reserve space for each pending (i.e., not-yet-listening) inpcb.
Add a regression test.
Reviewed by: glebius
Reported by: netchild
Fixes: 7cbb6b6e28 ("inpcb: Close some SO_REUSEPORT_LB races, part 2")
Differential Revision: https://reviews.freebsd.org/D49100
net80211 has inconsistent locking when calling into (*ic_ampdu_rx_stop)().
Make use of 054c5ddf58 and conditionally check if the caller
locked or not and if locked temporary drop the lock to avoid sleeping
on a non-sleepaable lock during the downcall into the driver.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
ieee80211_sta_recalc_aggregates() shows up in TODO traces but there is
nothing we have to do there until we have active links (MLO support).
Make the TODO conditional for a time when we will get there.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Define cfg80211 IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_MASK to
net80211 IEEE80211_VHTCAP_SUPP_CHAN_WIDTH_MASK.
We should likely at some point make a sweep and replace all the
values with the defines from the comments for the matching net80211
version.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
When setting the per-band supp_rates bitfield check for mandatory
rates only. We cannot easily say at that point for 2Ghz whether
11g is supported so assume these days it is not pure-b.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Implement cfg80211_chandef_create() to work with HT. Update enum
with HT channel types. When calling the function from LinuxKPI 802.11
code, pass in NL80211_CHAN_HT20 if HT is supported rather than NO_HT.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
UDP allows to sendto(2) on unconnected socket. The original BSD devise
was that such action would create a temporary (for the duration of the
syscall) connection between our inpcb and remote addr:port specified in
sockaddr 'to' of the syscall. This devise was broken in 2002 in
90162a4e87. For more motivation on the removal of the temporary
connection see email [1].
Since the removal of the true temporary connection the sendto(2) on
unconnected socket has the following side effects:
1) After first sendto(2) the "unconnected" socket will receive datagrams
destined to the selected port.
2) All subsequent sendto(2) calls will use the same source port.
Effectively, such sendto(2) acts like a bind(2) to INADDR_ANY:0. Indeed,
if you do this:
s1 = socket(PF_INET, SOCK_DGRAM, 0);
s2 = socket(PF_INET, SOCK_DGRAM, 0);
sendto(s1, ..., &somedestination, ...);
bind(s2, &{ .sin_addr = INADDR_ANY, sin_port = 0 });
And then look into kgdb at resulting inpcbs, you would find them equal in
all means modulo bound to different anonymous ports.
What is even more interesting is that Linux kernel had picked up same
behavior, including that "unconnected" socket will receive datagrams. So
it seems that such behavior is now an undocumented standard, thus I
covered it in recently added tests/sys/netinet/udp_bindings.
Now, with the above knowledge at hand, why are we using
in_pcbconnect_setup() and in_pcbinshash(), which are supposed to be
private to in_pcb.c, to achieve the binding? Let's use public KPI
in_pcbbind() on the first sendto(2) and use in_pcbladdr() on all
sendto(2)s. Apart from finally hiding these two should be private
functions, we no longer acquire global INP_HASH_WLOCK() for every
sendto(2) on unconnected socket as well as remove a couple workarounds.
[1] https://mail-archive.FreeBSD.org/cgi/mid.cgi?200210141935.aa83883
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D49043
When a socket has SO_BROADCAST set and destination address is INADDR_ANY
or INADDR_BROADCAST, the kernel shall pick up first broadcast capable
interface and broadcast the packet out of it. Since this API is not
reliable on a machine with > 1 broadcast capable interfaces, all practical
software seems to use IP_ONESBCAST or other mechanisms to send broadcasts.
This has been broken at least since FreeBSD 6.0, see bug 99558. Back then
the problem was in the fact that in_broadcast() check was always done
against the gateway address, not the destination address. Later, with
90cc51a1ab, a second problem piled on top - we aren't checking for
INADDR_ANY and INADDR_BROADCAST at all.
Better late than never, fix that by checking destination address.
PR: 99558
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D49042
This aligns with existing in_ifaddr_broadcast() and aligns with other
simple functions or macros with bare "in_" prefix that operator just on
struct in_addr and nothing else, e.g. in_nullhost(). No functional
change.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D49041
If the lock is unowned (i.e., owner == UMUTEX_CONTESTED), we might get a
spurious failure, and in that case we need to retry the loop.
Otherwise, the calling thread can end up sleeping forever.
The same problem exists in do_set_ceiling(), which open-codes
do_lock_pp(), so fix it there too.
Reviewed by: olce
Reported by: Daniel King <dmking@adacore.com>
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D49031
Two different functions in different files do the same thing - fill a
partial page with zeroes. Add that functionality to vm_page.c and
remove it elsewhere to avoid code duplication.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D49096
Create empty hints files for these platforms. They don't normally need a
hints file, but people use them for device instance wiring. It's less
confusing if they always exist.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D49052
Two different functions in different files do the same thing - fill a
partial page with zeroes. Add that functionality to vm_page.c and
remove it elsewhere to avoid code duplication.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D49096
We already pass struct pf_pdesc to pf_icmp_state_lookup(). There's no need to
also pass the direction.
Sponsored by: Rubicon Communications, LLC ("Netgate")
In most cases, IP fragments do not have an Ethernet padding. So
add a condition to save a useless call to m_adj() and have a paranoid
length check in the other cases.
OK henning@
Obtained from: OpenBSD, bluhm <bluhm@openbsd.org>, fcf0d61153
Obtained from: OpenBSD, chris <chris@openbsd.org>, ebe64b684c
Sponsored by: Rubicon Communications, LLC ("Netgate")
change log(matches) semantics slightly to make it more useful. since it
is a debug tool change of semantics not considered problematic.
up until now, log(matches) forced logging on subsequent matching rules,
the actual logging used the log settings from that matched rule.
now, log(matches) causes subsequent matches to be logged with the log settings
from the log(matches) rule. in particular (this was the driving point),
log(matches, to pflog23) allows you to have the trace log going to a seperate
pflog interface, not clobbering your regular pflogs, actually not affecting
them at all.
long conversation with bluhm about it, which didn't lead to a single bit
changed in the diff but was very very helpful. ok bluhm as well.
Obtained from: OpenBSD, henning <henning@openbsd.org>, f61b1efcce
Sponsored by: Rubicon Communications, LLC ("Netgate")
This adds the ESS EDMA driver introduced by the IPQ4018/IPQ4019.
It provides a number of transmit and receive rings which can be mapped
into virtual ethernet devices, which this driver supports.
It's partially integrated into the ar40xx etherswitch which supplies
the port and some filtering/VPN offload functionality. This driver
only currently supports the per-port options which allow for the
virtual ethernet driver mapping.
This was written by reverse engineering the functionality of the
ethernet switch and ethernet driver support provided by Qualcomm
Atheros via their OpenWRT contributions. The code is all originally
authored by myself.
Differential Revision: https://reviews.freebsd.org/D49027
Comply with style(9) and andd checks for booleaness when doing
bit tests.
If there is no need for double negated checks simplify them.
This all makes the conditions a lot easier to read.
Slip in a comment about MIC vs. MMIC.
No functional changes.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: emaste, adrian
Differential Revision: https://reviews.freebsd.org/D49055
Notable upstream pull request merges:
#16857387ed5ca4 Add recursive dataset mounting and unmounting support
to pam_zfs_key
#16929c2458ba92 optimize recv_fix_encryption_hierarchy()
#1698012f0baf34 Make the vfs.zfs.vdev.raidz_impl sysctl cross-platform
#1698640496514b Expand fragmentation table to reflect larger possibile
allocation sizes
#1700388020b993 Add kstats tracking gang allocations
#1701321205f648 Avoid ARC buffer transfrom operations in prefetch
#17016390f6c119 zio: lock parent zios when updating wait counts on
reexecute
#17029b8c73ab78 zio: do no-op injections just before handing off to vdevs
#170376a2f7b384 Fix metaslab group fragmentation math
#17040b901d4a0b Update the dataset name in handle after zfs_rename
Obtained from: OpenZFS
OpenZFS commit: 6a2f7b3844
During my progress on updating cc_cubic to RFC9438, found such redundancy as:
- W_est: we use the alternative stack local variable `W_est` in
`cubic_ack_received()`.
- cwnd_prior: it is used for Reno-Friendly Region in RFC9438 Section 4.3,
but we use the alternative cwnd from NewReno for Reno-Friendly as
in commit ee45061051.
No functional change intended.
Reviewed by: rscheff, tuexen
Differential Revision: https://reviews.freebsd.org/D49008
In particular, export a "port" entry as well as an array of "host"
entries for each active connection.
Reviewed by: asomers
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D48775
This finally allows to use source-hash for dynamic loadbalancing, eg.
"rdr-to <hosts> source-hash", instead of just round-robin and least-states.
An older pre-siphash version of this diff was tested by many people.
OK tedu@ benno@
Obtained from: OpenBSD, reyk <reyk@openbsd.org>, 252a05523f
Sponsored by: Rubicon Communications, LLC ("Netgate")
for ipv6, we stretch it out a bit, but good enough.
ok reyk
Obtained from: OpenBSD, tedu <tedu@openbsd.org>, a558d13e2f
Sponsored by: Rubicon Communications, LLC ("Netgate")
Certain NFSv4.1 callbacks are not currently supported/used
by the FreeBSD client. Without this patch, NFS4ERR_NOTSUPP
is replied for the callbacks. Since NFSv4.1 does not specify
all of these callbacks as optional, I think it is preferable
to reply NFS_OK or NFS4ERR_REJECT_DELEG instead of NFS4ERR_NOTSUPP.
This patch changes the reply status for these unsupported
callbacks, which the client has no use for.
I am not aware of any NFSv4.1 servers that will perform
any of these callbacks against the FreeBSD client at this time.
MFC after: 2 weeks
They are used by the DRM drivers in Linux 6.7.
Bump `FreeBSD_version` because external drivers that use `struct
shrinker` will have to be recompiled.
Reviewed by: bz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48747
This helps developers working with linuxkpi find out a function is
already defined but is just a stub.
Reported by: bz
Reviewed by: bz, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48862
Commit 03e39d3d42 decreased the Linux
version that exposes this constant to be Linux 6.7. It happens that the
constant is older.
However, it is removed in Linux 6.10. Let's change the version condition
to say that it is defined for any version before 6.10.
Reported by: bz
Reviewed by: bz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48861
To use the old version, one has to explicitly set `LINUXKPI_VERSION` to
the expected version of Linux KPI.
Reported by: bz
Reviewed by: bz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48860
In this case, the domain from which the memory to back the VM page array
is allocated does not matter, so just let vm_phys_early_alloc() choose
a suitable domain.
Reviewed by: jhibbits, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48888
Change the comment before this block of code, and separate the latter
from the preceding one by an empty line.
Move the loop on phys_avail[] to compute the minimum and maximum memory
physical addresses closer to the initialization of 'low_avail' and
'high_avail', so that it's immediately clear why the loop starts at
2 (and remove the related comment).
While here, fuse the additional loop in the VM_PHYSSEG_DENSE case that
is used to compute the exact physical memory size.
This change suppresses one occurence of detecting whether at least one
of VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE is defined at compile time, but
there is still another one in PHYS_TO_VM_PAGE().
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48632
Previously, such requests would lead to a panic. The only caller so far
(vm_phys_early_startup()) actually faces the case where some address can
be one of the chunk's boundaries and has to test it by hand. Moreover,
a later commit will introduce vm_phys_early_alloc_ex(), which will also
have to deal with such boundary cases.
Consequently, make this function handle boundaries by not splitting the
chunk and returning EJUSTRETURN instead of 0 to distinguish this case
from the "was split" result.
While here, expand the panic message when the address to split is not in
the passed chunk with available details.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48630
On improper termination of phys_avail[] (two consecutive 0 starting at
an even index), this function would (unnecessarily) continue searching
for the termination markers even if the index was out of bounds.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48629
Segments are passed by machine-dependent routines, so explicit checks
will make debugging much easier on very weird machines or when someone
is tweaking these machine-dependent routines. Additionally, this
operation is not performance-sensitive.
For the same reasons, test that we don't reach the maximum number of
physical segments (the compile-time of the internal storage) in
production kernels (replaces the existing KASSERT()).
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48628
A bad specification is if 'start' is strictly greater than 'end', or
bounds are not page aligned.
The latter was already tested under INVARIANTS, but now will be also on
production kernels. The reason is that vm_phys_early_startup() pours
early segments into the final phys_segs[] array via vm_phys_add_seg(),
but vm_phys_early_add_seg() did not check their validity. Checking
segments once and for all in vm_phys_add_seg() avoids duplicating
validity tests and is possible since early segments are not used before
being poured into phys_segs[]. Finally, vm_phys_add_seg() is not
performance critical.
Allow empty segments and discard them (silently, unless 'bootverbose' is
true), as vm_page_startup() was testing for this case before calling
vm_phys_add_seg(), and we felt the same test in vm_phys_early_startup()
was due before calling vm_phys_add_seg(). As a consequence, remove the
empty segment test from vm_page_startup().
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48627
The passed index must be the start of a chunk in phys_avail[], so must
be even. Test for that and print a separate panic message.
While here, fix panic messages: In one, the wrong chunk boundary was
printed, and in another, the desired but not the actual condition was
printed, possibly leading to confusion.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48626
When we receive an ICMP packet containing another ICMP packet we look up the
original ICMP state. This is done through a second struct pf_pdesc ('pd2'),
containing relevant information (i.e. addresses, type, id, ..).
pd2 did not contain the network interface ('kif'), leading to state lookup
failures. This only affected if-bound mode, because floating states match all
interfaces.
Set kif in pd2.
Extend the icmp.py:test_fragmentation_needed test case to use if-bound mode. It
already checked that we handled icmp-in-icmp correctly.
PR: 284866
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
We ran into problems with locking the VIF for the lsta tailq on and
off. Switch from a native tailq to the LinuxKPI list.
This allows us to implement the "rcu" part in
linuxkpi_ieee80211_iterate_keys() which we could not before.
Further using either rcu or the wiphy lock we no longer run into
problems with the lock not being sleepable.
The last case was rtw89 debugfs which was doing a sleepable alloc
in the iterator callback of linuxkpi_ieee80211_iterate_stations_atomic().
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Rather than just printing the cipher suite as uint32_t hex
(or split into OUI and number) also print a short name.
iwlwifi(4), for example, now prints on startup with HW_CRYPTO on:
unsupported WLAN Cipher Suite 0x000fac | 8 (GCMP)
unsupported WLAN Cipher Suite 0x000fac | 9 (GCMP_256)
unsupported WLAN Cipher Suite 0x000fac | 6 (AES_CMAC)
unsupported WLAN Cipher Suite 0x000fac | 11 (BIP_GMAC_128)
unsupported WLAN Cipher Suite 0x000fac | 12 (BIP_GMAC_256)
Likewise _lkpi_iv_key_set() would now print:
iwlwifi0: _lkpi_iv_key_set: CIPHER SUITE 0xfac02 (TKIP) not supported
Sponsored by: The FreeBSD Foundation
MFC after: 3 days