Commit graph

8059 commits

Author SHA1 Message Date
John Baldwin
519981e3c0 tcp_output: Clear FIN if tcp_m_copym truncates output length
Reviewed by:	rscheff, tuexen, gallatin
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D46824
2024-10-02 15:12:37 -04:00
Michael Tuexen
2eacb0841c tcp: small cleanup
No functional change intended.

Reviewed by:		cc, glebius, markj, rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46850
2024-10-01 17:34:35 +02:00
Gleb Smirnoff
57671d5ccc tcp: further cleanup old options
They all were experimental and some comments refer to internal Netflix
versions.  There is not reason to leak that into the header. Style unused
options so that they have the available value aligned with really used
values.

Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D46779
2024-09-30 12:11:37 -07:00
Michael Tuexen
01eb635d12 tcp: improve mbuf handling when processing SYN segments
When the sysctl-variable net.inet.ip.accept_sourceroute is non-zero,
an mbuf would be leaked when processing a SYN-segment containing an
IPv4 strict or loose source routing option, when the on-stack
syncache entry is used or there is an error related to processing
TCP MD5 options.
Fix this by freeing the mbuf whenever an error occurred or the
on-stack syncache entry is used.

Reviewed by:		markj, rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46839
2024-09-30 20:00:04 +02:00
Michael Tuexen
a2e4f45480 tcp: whitespace cleanup
No functional change intended.

Reported by:	markj
MFC after:	1 week
Sponsored by:	Netflix, Inc.
2024-09-30 19:53:57 +02:00
Michael Tuexen
cbc9438f05 tcp: improve ref count handling when processing SYN
Don't leak a reference count for so->so_cred when processing an
incoming SYN segment with an on-stack syncache entry and the
sysctl variable net.inet.tcp.syncache.see_other is false.

Reviewed by:		cc, markj, rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Pull Request:		https://reviews.freebsd.org/D46793
2024-09-28 22:06:41 +02:00
Michael Tuexen
78e1b031d2 tcp: improve MAC error handling for SYN segments
Don't leak a maclabel when SYN segments are processed which results
in an error due to MD5 signature handling.
Tweak the #idef MAC to allow additional upcoming changes.

Reviewed by:		markj
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46766
2024-09-26 08:10:01 +02:00
Gleb Smirnoff
a00c3a94bf tcp: remove remnants of 20+ year old disabled code from d912c694ee
Fixes:	90ad2dc287
2024-09-24 14:36:10 -07:00
Michael Tuexen
87fbd9fc7f tcp: remove unused socket option names
These IPPROTO_TCP-level socket option names correspond to socket
options, which are not implemented. So remove them.
Thanks to Peter Lei for suggesting this change.

Reviewed by:		rscheff, thj
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46623
2024-09-20 13:03:53 +02:00
Richard Scheffenegger
0a05ea1f56 tcp: keep syncache flags when updating ECN info
While processing the ECN flags of an incoming packet,
incorrectly cleared all other syncache flags.

Reported by: tuexen
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D46694
2024-09-18 11:18:30 +02:00
Cheng Cui
ee45061051
cc_cubic: use newreno to emulate AIMD in TCP-friendly region
Reviewed by: rscheff, tuexen
Differential Revision: https://reviews.freebsd.org/D46546
2024-09-17 10:37:00 -04:00
Cheng Cui
b6c137de0a
tcp cc: re-organize newreno functions into parts that can be re-used
Reviewed by: rscheff, tuexen
Differential Revision: https://reviews.freebsd.org/D46046
2024-09-17 09:54:17 -04:00
Michael Tuexen
1c6bb4c578 tcp: remove TCP_OFFLOAD_DISABLE
TCP_OFFLOAD_DISABLE is nowhere else used or defined. So remove it.
No functional change intended.

Reviewed by:		np
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46669
2024-09-15 11:44:49 +02:00
Michael Tuexen
1c60b2cb0b tcp: improve whitespace consistency for socket option names
No functional change intended.

Reviewed by:		rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46618
2024-09-10 13:03:50 +02:00
Michael Tuexen
e06cf0fc5d tcp: make tcp_lro_flush() static
tcp_lro_flush() is not used anymore outside of tcp_lro.c. Therefore
make it static.

Reviewed by:		rscheff, glebius, Peter Lei
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46435
2024-09-05 17:44:33 +02:00
Michael Tuexen
ef438f7706 tcp: improve consistency of syncache_respond() failure handling
When the initial sending of the SYN ACK segment using
syncache_respond() fails, it is handled as a permanent error.
To improve consistency, apply this policy in all cases, where
syncache_respond() is called. These include
* timer based retransmissions of the SYN ACK
* retransmitting a SYN ACK in response to a SYN retransmission
* sending of challenge ACKs in response to received RST segments
In these cases, fall back to SYN cookies, if enabled.
While there, also improve consistency of the TCP stats counters.

Reviewed by:		cc, glebius (earlier version)
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46428
2024-09-05 03:33:13 +02:00
Mark Johnston
7d508464f5 carp: Fix pullup checks
The conditions used to test whether a pullup is needed were inverted.

While here:
- Fix a bogus assignment to "iplen": it's already initialized to *offp.
- Use in_cksum_skip() instead of manually adjusting the data pointer.
  Otherwise the mbuf is temporarily in an invalid state, since m_len
  isn't updated to match.

Reported by:	KMSAN
Reviewed by:	kp
Sponsored by:	Klara, Inc.
Fixes:		3711515467 ("carp: support VRRPv3")
Differential Revision:	https://reviews.freebsd.org/D46492
2024-09-01 14:09:53 +00:00
Michael Tuexen
b2044c4557 tcp rack, bbr: improve handling of soft errors
Do not report an error, if it is stored as a soft error. This avoids,
for example, the dropping of TCP connections using an interface,
while enabling or disabling LRO on that interface.

Reviewed by:		cc
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46427
2024-08-30 09:26:41 +02:00
Mark Johnston
0c605af3f9 netinet: Explicitly disallow connections to the unspecified address
If the V_connect_ifaddr_wild sysctl says that we shouldn't infer a
destination address, return an error.  Otherwise it's possible for use
of an unspecified foreign address to trigger a subsequent assertion
failure, for example in in_pcblookup_hash_locked().

Similarly, if no interface addresses are assigned, fail quickly upon an
attempt to connect to the unspecified address.

Reported by:	Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after:	2 weeks
Reviewed by:	zlei, allanjude, emaste
Differential Revision:	https://reviews.freebsd.org/D46454
2024-08-29 13:11:15 +00:00
Kristof Provost
b1c3a4d75f netipsec: add probe points for the ipsec/esp/ah/ipcomp counters
Extend what we did for netinet counters in 60d8dbbef0 (netinet: add a probe
point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters, 2024-01-18) to the
IPsec code.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D46416
2024-08-28 12:02:45 +02:00
Kristof Provost
3b62f33500 netinet: fix LINT-NOINET build failure
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-08-27 16:45:46 +02:00
Michael Tuexen
6e7581236e tcp: whitespace cleanup in enum tcp_log_events
No functional change intended.

Sponsored by:	Netflix, Inc.
2024-08-25 22:05:41 +02:00
Michael Tuexen
e41364711c tcp: improve consistency of SYN-cache handling
Originally, a SYN-cache entry was always allocated and later freed,
when not needed anymore. Then the allocation was avoided, when no
SYN-cache entry was needed, and a copy on the stack was used.
But the logic regarding freeing was not updated.
This patch doesn't re-check conditions (which may have changed) when
deciding to insert or free the entry, but uses the result of
the earlier check.
This simplifies the code and improves also consistency.

Reviewed by:		glebius
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46410
2024-08-22 22:17:05 +02:00
Michael Tuexen
498286d4e8 tcp: fix format of sysctl variable
The format for CTLTYPE_UINT is "IU" instead of "UI" as specified
in sysctl.9.

Reviewed by:		cc, zlei
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46408
2024-08-22 14:44:47 +02:00
Michael Tuexen
a1d9ce19b1 sctp: fix format of sysctl variables
MFC after:		1 week
2024-08-22 09:07:27 +02:00
Michael Tuexen
64443828bb tcp: fix list iteration in tcp_lro_flush_active()
Use LIST_FOREACH_SAFE(), since the list element is removed from
the list in the loop body, zero out and inserted in the free list.

Reviewed by:		rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46383
2024-08-21 00:07:37 +02:00
Mark Johnston
417b35a97b netinet: Add a sysctl to allow disabling connections to INADDR_ANY
See the discussion in Bugzilla PR 280705 for context.

PR:		280705
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D46259
2024-08-20 21:31:57 +00:00
Michael Tuexen
aa6c490bf8 tcp: initialize the LRO hash table with correct size
There will at most lro_entries entries in the LRO hash table. So no
need to take lro_mbufs into account, which only results in the
LRO hash table being too large and therefore wasting memory.

Reviewed by:		rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46378
2024-08-20 17:30:55 +02:00
Eugene Grosbein
e5b8538083 libalias: add another check to previous change
If UseLink() returns NULL, it is possible that Deletelink()
has already freed "grp", so check it out carefully.

PR:		269770
Reported by:	Peter Much
X-MFC-With:	8132e95909
2024-08-20 21:04:13 +07:00
Eugene Grosbein
8132e95909 libalias: fix subtle racy problem in outside-inside forwarding
sys/netinet/libalias/alias_db.c has internal static function UseLink()
that passes a link to CleanupLink() to verify if the link has expired.
If so, UseLink() may return NULL.

_FindLinkIn()'s usage of UseLink() is not quite correct.

Assume there is "redirect_port udp" configured to forward incoming
traffic for specific port to some internal address.
Such a rule creates partially specified permanent link.

After first such packet libalias creates new fully specifiled
temporary LINK_UDP with default timeout 60 seconds.
Also, in case of low traffic libalias may assign "timestamp"
for this new temporary link way in the past because
LibAliasTime is updated seldom and can keep old value
for tens of seconds, and it will be used for the temporary link.

It may happen that next incoming packet for redirected port
passed to _FindLinkIn() results in a call to UseLink()
that returns NULL due to detected expiration.
Immediate return of NULL results in broken translation:
either a packet is dropped (deny_incoming mode) or delivered to
original destination address instead of internal one.

Fix it with additional check for NULL to proceed with a search
for original partially specified link. In case of UDP,
it also recreates temporary fully specified link
with a call to ReLink().

Practical examples are "redirect_port udp" rules for unidirectional
SYSLOG protocol (port 514) or some low volume VPN encapsulated in UDP.

Thanks to Peter Much for initial analysis and first version of a patch.

Reported by:	Peter Much <pmc@citylink.dinoex.sub.org>
PR:		269770
MFC after:	1 week
2024-08-19 10:34:37 +07:00
Cheng Cui
8cc528c682
tcp cc: clean up some un-used cc_var flags
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D46299
2024-08-15 09:33:04 -04:00
Michael Tuexen
9b569353e0 tcp: initialize V_ts_offset_secret for all vnets
Initialize V_ts_offset_secret for each vnet, not only for the
default vnet, since it is vnet specific.

Reviewed by:		Peter Lei
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46246
2024-08-09 16:12:22 +02:00
Randall Stewart
872164f559 Non-tested experimental code removal.
There is a new feature that came in with the last sync to the rack stack that should not have
been released. It is untested and may not well work. It currently is off by default, which is good
but it is best to remove it until such time that it can be vetted and tuned to actually work :)

This change removes just the experimental feature for now. It can make a appearance in the future
when it is proofed out.

Reviewed by: tuexen
Differential Revision:https://reviews.freebsd.org/D45410
2024-08-09 09:01:57 -04:00
Michael Tuexen
c349e881cf rack, bbr: cleanup ack throttling
Use the variable in the TCPCB, not the one in the stack specific
data structure. This simplifies the code and brings the functionality
to BBR without any change.

Reviewed by:		Peter Lei, cc
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46068
2024-08-07 20:25:53 +02:00
Michael Tuexen
b3bc746cf3 tcp: minor cleanup
The vnet component of struct tcp_syncache is only used if
VIMAGE is defined.
No functional change intended.

Reviewed by:		cc
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46240
2024-08-07 19:43:07 +02:00
Michael Tuexen
093d9b46f4 ddb: update printing of t_flags and tflags2
Update the ddb printing of t_flags and t_flags2 to the current state of
definitions in tcp_var.h.

Reviewed by:		cc
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46222
2024-08-05 11:17:30 +02:00
Andrew Gallatin
1f628be888 tcp_ratelimit: provide an api for drivers to release ratesets at detach
When the kernel is compiled with options RATELIMIT, the
mlx5en driver cannot detach. It gets stuck waiting for all
kernel users of its rates to drop to zero before finally calling
ether_ifdetach.

The tcp ratelimit code has an eventhandler for ifnet departure
which causes rates to be released. However, this is called as an
ifnet departure eventhandler, which is invoked as part of
ifdetach(), via either_ifdetach(). This means that the tcp
ratelimit code holds down many hw rates when the mlx5en driver
is waiting for the rate count to go to 0. Thus devctl detach
will deadlock on mlx5 with this stack:
mi_switch+0xcf sleepq_timedwait+0x2f _sleep+0x1a3 pause_sbt+0x77 mlx5e_destroy_ifp+0xaf mlx5_remove_device+0xa7 mlx5_unregister_device+0x78 mlx5_unload_one+0x10a remove_one+0x1e linux_pci_detach_device+0x36 linux_pci_detach+0x24 device_detach+0x180 devctl2_ioctl+0x3dc devfs_ioctl+0xbb vn_ioctl+0xca devfs_ioctl_f+0x1e kern_ioctl+0x1c3 sys_ioctl+0x10a

To fix this, provide an explicit API for a driver to call the tcp
ratelimit code telling it to detach itself from an ifnet. This
allows the mlx5 driver to unload cleanly. I considered adding an
ifnet pre-departure eventhandler. However, that would need to be
invoked by the driver, so a simple function call seemed better.

The mlx5en driver has been updated to call this function.

Reviewed by: kib, rrs

Differential Revision:	https://reviews.freebsd.org/D46221
Sponsored by: Netflix
2024-08-05 12:51:35 -04:00
Michael Tuexen
d6fb9f8ca3 tcp: inherit CC algorithm from listener
Like any other parameter, the CC algorithm should be inherited from
the listener.

Reviewed by:		cc
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46143
2024-08-03 22:56:39 +02:00
Michael Tuexen
fd53594ae7 tcp: retire sysctl variable functions_inherit_listen_socket_stack
The default was true and it is consistent to inherit the TCP function
block from the listener as most of the other parameters.

Reviewed by:		Peter Lei, cc
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46141
2024-08-03 22:52:17 +02:00
Michael Tuexen
52eacec95d tcp: fix t_flags2 collision
Fix the collision of TF2_IPSEC_TSO and TF2_NO_ISS_CHECK

Fixes:			646c28ea80 ("tcp: improve SEG.ACK validation")
MFC after:		1 week
Sponsored by:		Netflix, Inc.
2024-08-03 21:49:18 +02:00
Michael Tuexen
101a0f09e8 sctp: improve input validation for data chunks
fsn_included should only be considered, if first_frag_seen is true.
Also, fix the resetting of the control structure, if stream queues
are flushed.
This fixes a bug where a legitimate message sequence was incorrectly
classified as illegitimate.
Thanks to Victor Boivie for reporting the issue on the userland
stack.

MFC after:		3 days
2024-08-03 13:27:18 +02:00
Michael Tuexen
4d32367a44 Revert "udp: improve handling of cached route"
This reverts commit 7186765300.
Two tests of the test suite are failing. Reverting the change
until it is improved.
2024-07-30 11:46:27 +02:00
Michael Tuexen
7186765300 udp: improve handling of cached route
The inp_route pointer should only be provided to the network
layer, when no destination address is provided. This is only
one of the conditions, where a write lock is needed.
If, for example, the route is also cached, when the socket is
unbound, problems show up, when the sendto is called, then
connect and finally send, when the route for the addresses
provided in the sendto and connect call use different outgoing
interfaces.
While there, clearly document why the write lock is taken.

Reported by:		syzbot+59122d2e848087d3355a@syzkaller.appspotmail.com
Reviewed by:		Peter Lei, glebius
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46056
2024-07-28 23:36:48 +02:00
Michael Tuexen
4036380e02 tcp: vnetify sysctl variables ack_war_timewindow and ack_war_cnt
As suggested by glebius@. While there, improve the documentation.

Reviewed by:		Peter Lei, cc
MFC after:		1 week
Sponsored by:		Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D46140
2024-07-28 22:36:34 +02:00
Michael Tuexen
00d3b74406 tcp cc: remove non-working sctp support
As suggested by lstewart, remove the non-working SCTP support in the
TCP congestion control modules. SCTP has a similar functionality
(although not using kernel loadable modules), on which the TCP stuff
was built on, but the integration was never done.
No functional change intended.

Reviewed by:		Peter Lei, cc
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46142
2024-07-28 22:25:48 +02:00
Michael Tuexen
40299c55a0 tcp: implement challenge ACK throttling for the base stack
Implement ACK throttling of challenge ACKs as described in RFC 5961.

Reviewed by:		Peter Lei, rscheff, cc
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46066
2024-07-25 13:54:52 +02:00
Cheng Cui
9565854ab4
cc_cubic: remove the redundant variable num_cong_events from struct cubic.
Summary:
This variable was added by commit eb5bfdd065, but unnecessarily needed.
No functional change.

Reviewed by: tuexen

Differential Revision: https://reviews.freebsd.org/D46042
2024-07-25 13:11:32 -04:00
Michael Tuexen
7f2411b181 tcp: improve whitespace consistency
No functional change.

Sponsored by:	Netflix, Inc.
2024-07-22 08:59:45 +02:00
Michael Tuexen
37b3e6a660 tcp: use TCP_MAXWIN instead of 65535
This is suggested by cc@. No functional change.

Sponsored by:	Netflix, Inc.
2024-07-22 08:52:12 +02:00
Konrad Witaszczyk
bc06c51419 netinet: correct SIOCDIFADDR{,_IN6} calls to use {,in6_}ifreq
The SIOCDIFADDR{,_IN6} ioctls take an ifreq structure object, not an
ifaliasreq/in_aliasreq/in6_aliasreq structure object, as their argument.
As opposed to ifaliasreq/in_aliasreq/in6_aliasreq used by
SIOCAIFADDR{,_IN6}, the ifreq/in6_ifreq structures used by the
SIOCDIFADDR{,_IN6} ioctls do not include a separate field for a
broadcast address and other values required to add an address to a
network interface with SIOCAIFADDR{,_IN6}.

Whilst this issue is not specific to CHERI-extended architectures, it
was first observed on CheriBSD running on Arm Morello. For example,
incorrect calls using the in6_aliasreq object result in CHERI capability
violations. A pointer to the ifra_addr field in in6_aliasreq cast to the
ifru_addr union member of in6_ifreq results in bounds being set to the
union's larger size. Such bounds exceed the bounds of of in6_aliasreq
object and the bounds-setting instruction clears a tag of the object's
capability.

Reviewed by:	brooks, kp, oshogbo
Accepted by:	oshogbo (mentor)
Reported by:	CHERI
Obtained from:	CheriBSD
Differential Revision: https://reviews.freebsd.org/D46016
2024-07-22 14:17:21 +00:00