- use nlf_p_empty instead of declaring own empty array
- don't declare _IN() macro when we don't parse a header
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D48306
rsm cannot be NULL, when calling bbr_update_bbr_info().
So no need to check partially for it. No functional change intended.
Reviewed by: rrs
CID: 1523803
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48293
It is already known that rsm != NULL, so no need to check for it.
Reviewed by: rrs
CID: 1523815
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48282
Indicate that the missing of the break is intentionally.
Reviewed by: rrs
CID: 1523782
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48273
When storing the old beta values in rack_swap_beta_values(),
ensure that the newreno_flags field is initialized appropriately
instead of using an uninitialized value.
Since the stored newreno_flags aren't actually used, this fix
should not have any functional change.
Reviewed by: rrs
CID: 1523796
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48260
Instead of dealing with ifp == NULL, which should never happen,
assume that this is not true. Use KASSERT to make this clear.
No functional change intended.
Reviewed by: glebius, rrs
CID: 1523767
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48258
The stack has never limited use of addresses in these ranges as an
endpoint. The relatively recent sysctls control only forwarding of,
and ICMP response to, these addresses.
Reviewed by: bz
Fixes: efe58855f3 ("IPv4: experimental changes to allow net 0/8, 240/4, part of 127/8")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48262
Do not return an uninitialized value from ctf_do_queued_segments()
in case no packets are actually processed (all are skipped).
Reviewed by: rrs
CID: 1523774
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48217
The SUS doesn't mention this error code as a possible one [1]. The FreeBSD
manual page specifies a possible ECONNRESET for close(2):
[ECONNRESET] The underlying object was a stream socket that was
shut down by the peer before all pending data was
delivered.
In the past it had been EINVAL (see 21367f630d), and this EINVAL was
added as a safety measure in 623dce13c6. After conversion to
ECONNRESET it had been documented in the manual page in 78e3a7fdd5, but
I bet wasn't ever tested to actually be ever returned, cause the
tcp-testsuite[2] didn't exist back then. So documentation is incorrect
since 2006, if my bet wins. Anyway, in the modern FreeBSD the condition
described above doesn't end up with ECONNRESET error code from close(2).
The error condition is reported via SO_ERROR socket option, though. This
can be checked using the tcp-testsuite, temporarily disabling the
getsockopt(SO_ERROR) lines using sed command [3]. Most of these
getsockopt(2)s are followed by '+0.00 close(3) = 0', which will confirm
that close(2) doesn't return ECONNRESET even on a socket that has the
error stored, neither it is returned in the case described in the manual
page. The latter case is covered by multiple tests residing in tcp-
testsuite/state-event-engine/rcv-rst-*.
However, the deleted block of code could be entered in a race condition
between close(2) and processing of incoming packet, when connection had
already been half-closed with shutdown(SHUT_WR) and sits in TCPS_LAST_ACK.
This was reported in the bug 146845. With the block deleted, we will
continue into tcp_disconnect() which has proper handling of INP_DROPPED.
The race explanation follows. The connection is in TCPS_LAST_ACK. The
network input thread acquires the tcpcb lock first, sets INP_DROPPED,
acquires the socket lock in soisdisconnected() and clears SS_ISCONNECTED.
Meanwhile, the syscall thread goes through sodisconnect() which checks for
SS_ISCONNECTED locklessly(!). The check passes and the thread blocks on
the tcpcb lock in tcp_usr_disconnect(). Once input thread releases the
lock, the syscall thread observes INP_DROPPED and returns ECONNRESET.
- Thread 1: tcp_do_segment()->tcp_close()->in_pcbdrop(),soisdisconnected()
- Thread 2: sys_close()...->soclose()->sodisconnect()->tcp_usr_disconnect()
Note that the lockless operation in sodisconnect() isn't correct, but
enforcing the socket lock there will not fix the problem.
[1] https://pubs.opengroup.org/onlinepubs/9799919799/
[2] https://github.com/freebsd-net/tcp-testsuite
[3] sed -i "" -Ee '/\+0\.00 getsockopt\(3, SOL_SOCKET, SO_ERROR, \[ECONNRESET\]/d' $(grep -lr ECONNRESET tcp-testsuite)
PR: 146845
Reviewed by: tuexen, rrs, imp
Differential Revision: https://reviews.freebsd.org/D48148
See commit 4f02a7d739 for more background.
I cannot see a good reason to continue ignoring mismatching UIDs when
binding to INADDR_ANY. Looking at the sdr.V2.4a7n sources (mentioned in
bugzilla PR 7713), there is a CANT_MCAST_BIND hack wherein the
application binds to INADDR_ANY instead of a multicast address, but
CANT_MCAST_BIND isn't defined for FreeBSD builds.
It seems unlikely that we still have a use-case for allowing sockets
from different UIDs to bind to the same port when binding to the
unspecified address. And, as noted in D47832, applications like sdr
would have been broken by the inverted SO_REUSEPORT check removed in
that revision, apparently without any bug reports. Let's break
compatibility and simply disallow this case outright.
Also, add some comments, remove a hack in a regression test which tests
this funtionality, and add a new regression test to exercise the
remaining checks that were added in commit 4658dc8325.
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47870
Similar to the existing in6_mask2len() function, but for IPv4. This will be used
by pf's nat64 code.
Obtained from: OpenBSD
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47785
Either input or output return unlocked on failure. Should be no
functional change.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D47925
Actually check the conditions that are enforced by the error checking
code instead of a condition which is
* checking a number to be non-negative instead of positive
* depending on a random number
Perform the checks consistently for ICMPv4 and ICMPv6.
Reviewed by: glebius, rrs, cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48001
This check for SO_REUSEPORT was added way back in commit 52b65dbe85.
Per the commit log, this commit restricted this port-stealing check to
unicast addresses, and then only if the existing socket does not have
SO_REUSEPORT set. In other words, if there exists a socket bound to
INADDR_ANY, and we bind a socket to INADDR_ANY with the same port, then
the two sockets need not be owned by the same user if the existing
socket has SO_REUSEPORT set.
This is a surprising semantic; bugzilla PR 7713 gives some additional
context. That PR makes a case for the behaviour described above when
binding to a multicast address. But, the SO_REUSEPORT check is only
applied when binding to a non-multicast address, so it doesn't really
make sense. In the PR the committer notes that "unicast applications
don't set SO_REUSEPORT", which makes some sense, but also refers to
"multicast applications that bind to INADDR_ANY", which sounds a bit
suspicious.
OpenBSD performs the multicast check, but not the SO_REUSEPORT check.
DragonflyBSD removed the SO_REUSEPORT (and INADDR_ANY) checks back in
2014 (commit 0323d5fde12a4). NetBSD explicitly copied our logic and
still has it.
The plot thickens: 20 years later, SO_REUSEPORT_LB was ported from
DragonflyBSD: this option provides similar semantics to SO_REUSEPORT,
but for unicast addresses it causes incoming connections/datagrams to be
distributed among all sockets in the group. This commit (1a43cff92a)
inverted the check for SO_REUSEPORT while adding one for
SO_REUSEPORT_LB; this appears to have been inadvertent. However:
- apparently no one has noticed that the semantics were changed;
- sockets belonging to different users can now be bound to the same port
so long as they belong to a single lbgroup bound to INADDR_ANY, which
is not correct.
Simply remove the SO_REUSEPORT(_LB) checks, as their original
justification was dubious and their current implementation is wrong; add
some tests.
Reviewed by: glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47832
For a long time, the inpcb lookup path has been lockless in the common
case: we use net_epoch to synchronize lookups. However, the routines
which update lbgroups were not careful to synchronize with unlocked
lookups. I believe that in the worst case this can result in spurious
connection aborts (I have a regression test case to exercise this), but
it's hard to be certain.
Modify in_pcblbgroup* routines to synchronize with unlocked lookup:
- When removing inpcbs from an lbgroup, do not shrink the array.
The maximum number of lbgroup entries is INPCBLBGROUP_SIZMAX (256),
and it doesn't seem worth the complexity to shrink the array when a
socket is removed.
- When resizing an lbgroup, do not insert it into the hash table until
it is fully initialized; otherwise lookups may observe a partially
constructed lbgroup.
- When adding an inpcb to the group, increment the counter after adding
the array entry, using a release store. Otherwise it's possible for
lookups to observe a null array slot.
- When looking up an entry, use a corresponding acquire load.
Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48020
Add support for endpoint-independent mapping ("full cone NAT") in
Libalias's UDP NAT.
This conforms to RFC 4787 requirements 1 and 3. All UDP packets sent out from a
particular internal address:port leave via the same NAT address:port,
regardless of their destination.
Add some libalias tests and supporting defines.
Reviewed by: igoro, thj
Differential Revision: https://reviews.freebsd.org/D46689D
A large portion of these functions just determines whether the inpcb can
bind to the address/port. This portion has no side effects, so is a
good candidate to move into its own helper function. This patch does
so, making the callers less complicated and reducing indentation.
While moving this code, also make some changes:
- Load socket options (SO_REUSEADDR etc.) only once. There is nothing
preventing another thread from toggling the socket options, so make
this function easier to reason about by avoiding races.
- When checking whether the bind address is an interface address, make a
separate sockaddr rather than temporarily modifying the one passed to
in_pcbbind().
Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47590
The macros version of lock/unlock has already been used 23 times in this
file and the bare version was used 6 times only, so prefer the former.
No functional change.
Add support for the AE Flag in the TCP header to pf and ppp.
Commonalize to the use of "E"(ECE), "W"(CWR) and "e"(AE)
for the TCP header flags, in line with tcpdump.
Reviewers: kp, cc, tuexen, cy, #transport!
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47106
Formally, there are 12 bits for TCP header flags.
Use the accessor functions in more (kernel) places.
No functional change.
Reviewed By: cc, #transport, cy, glebius, #iflib, kbowling
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47063
The variable b[] is on the stack, thus cannot overlap with ipov, which
points to the heap area, so prefer memcpy() over memmove(), aka bcopy().
No functional change intended.
Reviewed by: cc, rrs, cy, #transport, #network
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47713
Use hc_ prefix instead of rmx_. The latter stands for "route metrix" and
is an artifact from the 90-ies, when TCP caching was embedded into the
routing table. The rename should have happened back in 97d8d152c2.
No functional change. Done with sed(1) command:
s/rmx_(mtu|ssthresh|rtt|rttvar|cwnd|sendpipe|recvpipe|granularity|expire|q|hits|updates)/hc_\1/g
If during ip_forward() we find a blackhole (or reject) route we should stop
processing and count this in the 'cantforward' counter, just like we already do
for IPv6.
Blackhole routes are set to use the loopback interface, so we don't actually
incorrectly forward traffic, but we do fail to count it as unroutable.
Test this, both for IPv4 and IPv6.
Reviewed by: melifaro
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47529
Disable the IP/IP6/ICMP/... counter probe points by default.
They are kept enabled in debug builds, and can be enabled with
'options KDTRACE_MIB_SDT'.
Requested by: glebius
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47657
The socket buffer locking used to be standard on SOCKBUF_LOCK/UNLOCK. But we are now
moving to a more elegant SOCK_SENDBUF_LOCK/UNLOCK and SOCK_RECVBUF_LOCK/UNLOCK.
Lets get BBR and Rack to use these updated macros.
Reviewed by:glebius, tuexen, rscheff
Differential Revision:https://reviews.freebsd.org/D47542
This was useful when TCP timers were not able to reliably stop. Note that
in_pcbinfo_destroy() called later asserts that V_tcbinfo.ipi_count is 0.
This reverts 806929d514, b54e08e11a.
Avoid sending small segments by making sure that cwnd is usually
calculated in full (data) segment sizes. Especially during loss
recovery and retransmission scenarios.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47474
Properly calculate the expected flight size (cwnd) during
limited transmit. Exclude the SACK scoreboard from
consideration when still in limited transmit.
PR: 282605
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47541