Commit graph

7823 commits

Author SHA1 Message Date
Richard Scheffenegger
f071abd92e tcp: properly initialize LRD while accepting session in syncache
Inherit the setting from the listener socket in syncache_socket.

MFC after:             2 weeks
Reviewed By:           tuexen, #transport
Sponsored by:          NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D42874
2023-12-02 12:17:01 +01:00
Richard Scheffenegger
f42518ff12 tcp: for LRD move sysctl from tcp.do_lrd tp tcp.sack.lrd, remove sockopt
Moving lrd sysctl to the tcp.sack branch, since LRD only works with SACK.
Remove the sockopt to programmatically control LRD per session.

Reviewed By:           #transport, tuexen, rrs
Sponsored by:          NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D42851
2023-11-30 21:11:45 +01:00
Gleb Smirnoff
0fac350c54 sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912, use same approach
for two simplier syscalls that return socket addresses.  Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.

Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D42694
2023-11-30 08:31:10 -08:00
Gleb Smirnoff
cfb1e92912 sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.

While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient.  Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do.  Linux also does that. Update tests accordingly.

Reviewed by:		rscheff, tuexen, zlei, dchagin
Differential Revision:	https://reviews.freebsd.org/D42635
2023-11-30 08:30:55 -08:00
Richard Scheffenegger
34c45bc6a3 tcp: enable LRD by default
Lost Retransmission Detection was added as a
feature in May 2021, but disabled by default.

Enabling the feature by default to reduce the
flow completion time by avoiding RTOs when
retransmissions get lost too.

Reviewed By:           tuexen, #transport, zlei
MFC after:             10 weeks
Sponsored by:          NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D42845
2023-11-30 05:38:16 +01:00
Randall Stewart
6a79e48076 Fix two latent bugs in hpts. One where a static is put on
a local variable, the other an initialization bug where
we should be setting tv.tv_sec to 0.

PR:	275482
2023-11-27 14:38:06 -05:00
Warner Losh
fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Warner Losh
29363fb446 sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
Mitchell Horne
d79a9edb5c alq, siftr: add panic/debugger checks to shutdown hooks
Don't try to gracefully terminate the pkt_manager thread if the
scheduler is not running.

We should not attempt to shutdown ald if RB_NOSYNC is set, and must not
if the scheduler is stopped (the function calls wakeup()).

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D42340
2023-11-23 12:07:42 -04:00
Kristof Provost
b01cad6d3a ip_mroute: handle V_mfchashtbl allocation failure
We allocate V_mfchashtbl with HASH_NOWAIT (which maps to M_NOWAIT), so
this allocation may fail. As we didn't handle that failure we could end
up dereferencing a NULL pointer later (e.g. during X_ip_mrouter_done()).

Do the obvious thing and fail out if we cannot allocate the table.

See also:	https://redmine.pfsense.org/issues/14917
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-11-22 14:47:14 +01:00
Gleb Smirnoff
219a6ca919 tcp: uninline tcp_account_for_send()
This allows to clear inclusion of "opt_kern_tls.h" from a system header.

Reviewed by:		rscheff, tuexen
Differential Revision:	https://reviews.freebsd.org/D42696
2023-11-21 09:21:41 -08:00
Gleb Smirnoff
bbbd7aab1b inpcb: garbage collect in_pcbnotifyall() 2023-11-20 14:38:31 -08:00
Richard Scheffenegger
49a6fbe387 [tcp] add PRR 6937bis heuristic and retire prr_conservative sysctl
Improve Proportional Rate Reduction (RFC6937) by using a
heuristic, which automatically chooses between
conservative CRB and more aggressive SSRB modes.
Only when snd_una advances (a partial ACK), SSRB may be
used. Also, that ACK must not have any indication of
ongoing loss - using the addition of new holes into the
scoreboard as proxy for such an event.

MFC after: 4 weeks
Reviewed By: #transport, kbowling, rrs
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28822
2023-11-15 23:10:29 +01:00
Gleb Smirnoff
70e30addaf tcp: remove extraneous network epoch entry
accept(2) on IPv6 TCP doesn't need epoch.  Some leaf functions may
need it, but they will enter accordingly, see sa6_recoverscope().

Reviewed by:		rscheff, tuexen (implicitly, see deleted XXXMT)
Differential Revision:	https://reviews.freebsd.org/D42634
2023-11-16 18:30:35 -08:00
Michael Tuexen
3bbbfc8dcd sctp: minor clean
No functional change intended.
MFC after:	1 week
2023-11-06 11:04:15 +01:00
Michael Tuexen
03c3a70abe udplite: make socketoption available on IPv6 sockets
This patch allows the IPPROTO_UDPLITE-level socket options
UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV to be used on
AF_INET6 sockets in addition to AF_INET sockets.

Reviewed by:		ae, rscheff
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D42430
2023-11-05 15:28:54 +01:00
Michael Tuexen
b10ae5a9b2 tcp rack: remove references to rb trees
The references should have been removed in
https://cgit.freebsd.org/src/commit/?id=030434acaf4631c4e205f8bccedcc7f845cbfcbf

Reviewed by:		rscheff, zlei
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D42386
2023-11-05 15:21:06 +01:00
Igor Ostapenko
b68d2789f0 ip_var.h: align comment style
MFC after:	2 weeks
Reviewed by:	kp
Pull Request:	https://github.com/freebsd/freebsd-src/pull/883
2023-11-01 15:41:36 +01:00
Michael Tuexen
aa64a8f5c3 udplite: fix checksum computation on the sender side
Don't fill the fields of the UDP/IP header not used for the
checksum computation before performing the checksum computation.

Reviewed by:		glebius
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D42275
2023-11-01 10:24:56 +01:00
Warner Losh
afd155c72b netinet: The tailq_hash code doesn't reference tcpoutflags
Don't define TCPOUTFLAGS to get the static definition from tcp_fsm.h.
tailq_hash.c doesn't refernce tcpoutflag. Only files that reference this
should define TCPOUTFLAGS. clang is fine with it, but gcc12 complained.

Sponsored by:		Netflix
2023-10-27 08:41:25 -06:00
Mark Johnston
876fddc886 tcp: Silence a -Wunused-function warning in tcp_ratelimit.h
No functional change intended.
2023-10-25 10:03:58 -04:00
Kristof Provost
c1146e6ad6 pf: use an enum for packet direction in divert tag
The benefit is that in the debugger you will see PF_DIVERT_MTAG_DIR_IN
instead of 1 when looking at a structure. And compilation time failure
if anybody sets it to a wrong value. Using "port" instead of "ndir" when
assigning a port improves readability of code.

Suggested by:	glebius
MFC after:	3 weeks
X-MFC-With:	fabf705f4b
2023-10-20 09:16:08 +02:00
Igor Ostapenko
fabf705f4b pf: fix pf divert-to loop
Resolved conflict between ipfw and pf if both are used and pf wants to
do divert(4) by having separate mtags for pf and ipfw.

Also fix the incorrect 'rulenum' check, which caused the reported loop.

While here add a few test cases to ensure that divert-to works as
expected, even if ipfw is loaded.

divert(4)
PR:		272770
MFC after:	3 weeks
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D42142
2023-10-19 12:12:15 +02:00
Richard Scheffenegger
22dc8609c5 tcp: use signed IsLost() related accounting variables
Coverity found that one safety check (kassert) was not
functional, as possible incorrect subtractions during
the accounting wouldn't show up as (invalid) negative
values.

Reported by: gallatin
Reviewed By: cc, #transport
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D42180
2023-10-17 16:37:09 +02:00
Kristof Provost
ab393e9548 netlink: move NETLINK define to opt_global.h
Move the NETLINK define into opt_global.h so we can rely on it being
set correctly, without having to remember to include opt_netlink.h.
This ensures that the NETLINK define is correctly set. If not we
may end up with unloadable modules, due to missing symbols (such as
nlmsg_get_group_writer).

PR:		274306
Reviewed by:	imp, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D42179
2023-10-13 09:23:47 +02:00
Richard Scheffenegger
91ee2d8d9a tcp: clear SACK state when scoreboard is forcefully freed
When a Retransmission Timeout happens during an on-going SACK loss recovery
episode, the internal SACK accounting was not cleared.

Reported by: pho
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D42162
2023-10-11 23:09:58 +02:00
Richard Scheffenegger
e2c6a6d29b tcp: include RFC6675 IsLost() in pipe calculation
Add more accounting while processing SACK data, to
keep track of when a packet is deemed lost using
the RFC6675 guidance.

Together with PRR (RFC6972) this allows a sender to
retransmit presumed lost packets faster, and loss
recovery to complete earlier.

Reviewed By: cc, rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D39299
2023-10-09 12:37:20 +02:00
Zhenlei Huang
dac91eb766 sctp: Various fixes for loader tunables
The following sysctl variables are actually loader tunables. Add sysctl
flag CTLFLAG_TUN to them so that `sysctl -T` will report them correctly.

 1. net.inet.sctp.tcbhashsize
 2. net.inet.sctp.pcbhashsize
 3. net.inet.sctp.chunkscale

The loader tunable 'net.inet.sctp.tcbhashsize' and 'net.inet.sctp.chunkscale'
are only used during vnet initializing, thus it make no senses to make them
writable tunable.

Validate the values of loader tunables on vnet initialize, reset them to
theirs defaults if invalid to prevent potential kernel panics.

Reviewed by:	tuexen, #transport, #network
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42007
2023-10-09 12:36:48 +08:00
Zhenlei Huang
38ecc80b2a tcp: Simplify the initialization of loader tunable 'net.inet.tcp.tcbhashsize'
No functional change intended.

Reviewed by:	cc, rscheff, #transport
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D41998
2023-10-08 18:03:59 +08:00
Michael Tuexen
abca3ae773 udp: fix sending of IPv4-mapped addresses
The inp_vflags field must be adjusted during the call of
in_pcbbind_setup(). This is consistent with the other places in the
code, but not elegant at all.

PR:			274009
Reported by:		syzbot+81ccc423a2737ed031ac@syzkaller.appspotmail.com
Reported by:		syzbot+c8e3dac881bba85bc029@syzkaller.appspotmail.com
Reviewed by:		markj, rrs, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D42031
2023-10-07 15:56:00 +02:00
Randall Stewart
8818f0f112 TCP: Fix a rack bug that skyzall found which results in a crash.
So when we call the fast_rsm retransmit path, we should always move
snd_nxt back up to snd_max. In fact during ack-processing if snd_nxt
falls behind it should be moved up there as well. Otherwise what
can happen is we have an incorrect mark on snd_nxt and incorrectly
calculate the offset when we go through the  front path (which is
what skzyall was able to do) then when we go to clean up the
send the offset is all wrong and we crash.

Special thanks to Gleb for pointing out the problem and the email
that had the reproducer so I could find the issue.

Reported-by: syzbot+f5061a372f74f021ec02@syzkaller.appspotmail.com
Sponsored by: Netflix Inc
2023-10-04 15:16:01 -04:00
Mark Johnston
d94d07d581 netdump: Check the return value of ifunit_ref()
We may fail to match if the specific interface doesn't exist or was
renamed.

PR:		273715
Reported by:	grembo
MFC after:	1 week
2023-10-02 08:09:26 -04:00
Olivier Certner
5817169bc4 Fix 'security.bsd.see_jail_proc' by using cr_bsd_visible()
As implemented, this security policy would only prevent seeing processes
in sub-jails, but would not prevent sending signals to, changing
priority of or debugging processes in these, enabling attacks where
unprivileged users could tamper with random processes in sub-jails in
particular circumstances (conflated UIDs) despite the policy being
enforced.

PR:                     272092
Reviewed by:            mhorne
MFC after:              2 weeks
Sponsored by:           Kumacom SAS
Differential Revision:  https://reviews.freebsd.org/D40628
2023-09-28 11:59:08 -03:00
Zhenlei Huang
f549e22901 ip_mroute: Fix sysctl knobs
The loader tunable `net.inet.ip.mfchashsize` does not have corresponding
sysctl MIB entry. Just add it.

While here, the sysctl variable `net.inet.pim.squelch_wholepkt` is actually
a loader tunable. Add sysctl flag CTLFLAG_TUN to it so that `sysctl -T`
will report it correctly.

Reviewed by:	kp
Fixes:		443fc3176d Introduce a number of changes to the MROUTING code
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D41997
2023-09-28 00:23:22 +08:00
Michael Tuexen
bb56b36d71 sctp: further improve shutting down the read side of a socket
Deal with the case that the association is already gone.

Reported by:	syzbot+e256d42e9b390564530a@syzkaller.appspotmail.com
MFC after:	3 days
2023-09-13 13:02:51 +02:00
Michael Tuexen
81c5f0fac9 sctp: improve shutting down the read side of a socket
When shutdown(..., SHUT_RD) or shutdown(..., SHUT_RDWR) is called,
really clean up the read queue and issue an ungraceful shutdown if
user messages are affected.

Reported by:	syzbot+d4e1d30d578891245f59@syzkaller.appspotmail.com
MFC after:	3 days
2023-09-13 01:36:14 +02:00
Cheng Cui
fafb03ab42
siftr: flush pkt_nodes to the log file in batch
Reviewed by: rscheff, tuexen
Differential Revision: https://reviews.freebsd.org/D41175
2023-09-11 11:23:27 -04:00
Zhenlei Huang
242fa308f3 carp: Explicitly mark tunnable net.inet.carp.allow with CTLFLAG_NOFETCH
With recent change 110113bc08, a vnet tunable can be initialized when
there is a corresponding kernel environment variable unless it is marked
with the flag CTLFLAG_NOFETCH.

The initialization may happen during early boot(linker preload), at that
time vnet0 has not been created. The hander carp_allow_sysctl() for the
tunable net.inet.carp.allow requires vnet, thus invoking it during early
boot will cause kernel panic.

The tunnable is initialized by vnet sysinit routine ipcarp_sysinit() so
let's just mark it with flag CTLFLAG_NOFETCH.

No functional change intended.

Fixes:		110113bc08 sysctl(9): Enable vnet sysctl variables to be loader tunable
MFC after:	2 week
Differential Revision:	https://reviews.freebsd.org/D41525
2023-09-09 16:10:32 +08:00
Michael Tuexen
1e81a4e7e8 sctp: don't call sctp_ulp_notify() recursively
This does not work with the new locking scheme.

MFC after:	3 days
2023-09-08 21:19:59 +02:00
Michael Tuexen
f9425b3a85 sctp: cleanup locking for notifications
All notifications are now queued via sctp_ulp_notify(). Do
the locking of the inp read lock there and validate this in all
functions being used.
This is one step in avoiding race conditions when closing the
read end of an SCTP socket.

MFC after:	3 days
2023-09-08 16:20:51 +02:00
Michael Tuexen
3ac7664774 sctp: make sure all SCTP RESET notifications use sctp_ulp_notify()
While there, improve consistency of the notification related code.
No functional change intended.

MFC after:	3 days
2023-09-08 14:19:56 +02:00
Michael Tuexen
cd3770c5fe sctp: cleanup SCTP AUTH related notification
This makes consistent use of the parameters and ensures that
all SCTP AUTH related notifications are using sctp_ulp_notify().

No functional change intended.

MFC after:	3 days
2023-09-08 13:13:43 +02:00
Zhenlei Huang
224aec05e7 tcp: Initialize the maximum number of entries in a client cookie cache bucket
This vnet loader tunable is defined with SYSCTL_PROC, thus will not be
initialized by kernel on vnet creating and will always have the default
value TCP_FASTOPEN_CCACHE_BUCKET_LIMIT_DEFAULT.

Fix by fetching the value from the corresponding kernel environment during
vnet constructing.

PR:		273509
Reviewed by:	#transport, tuexen
Fixes:	c560df6f12 This is an implementation of the client side of TCP Fast Open (TFO) [RFC7413]
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D41691
2023-09-03 04:34:07 +08:00
Kristof Provost
fa03d37432 mcast: fix memory leak in imf_purge()
The IGMP code buffers packets in the imf_inm->inm_scq mbufq, but does
not clear this queue when struct in_mfilter is freed by imf_purge().
This can cause memory leaks if IGMPv3 is used.

Purge the mbufq on imf_purge().

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D41629
2023-08-31 12:16:20 +02:00
Kristof Provost
b94ec00ba7 igmp: do not upgrade IGMP version beyond net.inet.igmp.default_version
IGMP requires hosts to use the lowest version they've seen on the
network. When the IGMP timers expire we take the opportunity to upgrade again.
However, we did not take the net.inet.igmp.default_version sysctl
setting into account, so we could end up switching to IGMPv3 even if the
user had requested IGMPv2 or IGMPv1 via the sysctl.

Check V_igmp_default_version before we upgrade the IGMP version.

Reviewed by:	adrian
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D41628
2023-08-30 09:22:05 +02:00
Michael Tuexen
e40d16ad6e sctp: cleanup
In particular, don't use a socket level flag, use the inp level one.
After adding appropriate locking, this will close a race condition.

MFC after:	1 week
2023-08-25 17:31:19 +02:00
Michael Tuexen
f0c8e8118d sctp: cleanup
No functional change intended. Just asserting the conditions when
being called.

MFC after:	1 week
2023-08-25 17:26:58 +02:00
Michael Tuexen
847fa61fad sctp: improve handling of socket shutdown for reading
If a socket is marked as cannot read anymore, drop chunks which
should be added to a control element in the receive queue.
This is consistent with dropping control elements instead of
adding them in the same situation.

Reported by:	syzbot+291f6581cecb77097b16@syzkaller.appspotmail.com
MFC after:	1 week
2023-08-24 15:52:55 +02:00
Michael Tuexen
d18c845f99 sctp: improve handling of SHUTDOWN and SHUTDOWN ACK chunks
When handling a SHUTDOWN or SHUTDOWN ACK chunk detect if the peer
is violating the protocol by not having made sure all user messages
are reveived by the peer. If this situation is detected, abort the
association.

MFC after:	1 week
2023-08-23 08:36:15 +02:00
Marius Strobl
dc485b968d tcp_info: Add and export more FreeBSD-specific fields
This change adds struct tcp_info fields corresponding to the following
struct tcpcb ones:
- snd_una
- snd_max
- rcv_numsacks
- rcv_adv
- dupacks

Note that while both tcp_fill_info() and fill_tcp_info_from_tcb() are
extended accordingly, no counterpart of rcv_numsacks is available in
the cxgbe(4) TOE PCB, though.

Sponsored by:	NetApp, Inc. (originally)
2023-08-22 20:34:01 +02:00