They all were experimental and some comments refer to internal Netflix
versions. There is not reason to leak that into the header. Style unused
options so that they have the available value aligned with really used
values.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D46779
When the sysctl-variable net.inet.ip.accept_sourceroute is non-zero,
an mbuf would be leaked when processing a SYN-segment containing an
IPv4 strict or loose source routing option, when the on-stack
syncache entry is used or there is an error related to processing
TCP MD5 options.
Fix this by freeing the mbuf whenever an error occurred or the
on-stack syncache entry is used.
Reviewed by: markj, rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46839
Don't leak a reference count for so->so_cred when processing an
incoming SYN segment with an on-stack syncache entry and the
sysctl variable net.inet.tcp.syncache.see_other is false.
Reviewed by: cc, markj, rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Pull Request: https://reviews.freebsd.org/D46793
Don't leak a maclabel when SYN segments are processed which results
in an error due to MD5 signature handling.
Tweak the #idef MAC to allow additional upcoming changes.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46766
These IPPROTO_TCP-level socket option names correspond to socket
options, which are not implemented. So remove them.
Thanks to Peter Lei for suggesting this change.
Reviewed by: rscheff, thj
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46623
While processing the ECN flags of an incoming packet,
incorrectly cleared all other syncache flags.
Reported by: tuexen
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D46694
TCP_OFFLOAD_DISABLE is nowhere else used or defined. So remove it.
No functional change intended.
Reviewed by: np
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46669
tcp_lro_flush() is not used anymore outside of tcp_lro.c. Therefore
make it static.
Reviewed by: rscheff, glebius, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46435
When the initial sending of the SYN ACK segment using
syncache_respond() fails, it is handled as a permanent error.
To improve consistency, apply this policy in all cases, where
syncache_respond() is called. These include
* timer based retransmissions of the SYN ACK
* retransmitting a SYN ACK in response to a SYN retransmission
* sending of challenge ACKs in response to received RST segments
In these cases, fall back to SYN cookies, if enabled.
While there, also improve consistency of the TCP stats counters.
Reviewed by: cc, glebius (earlier version)
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46428
The conditions used to test whether a pullup is needed were inverted.
While here:
- Fix a bogus assignment to "iplen": it's already initialized to *offp.
- Use in_cksum_skip() instead of manually adjusting the data pointer.
Otherwise the mbuf is temporarily in an invalid state, since m_len
isn't updated to match.
Reported by: KMSAN
Reviewed by: kp
Sponsored by: Klara, Inc.
Fixes: 3711515467 ("carp: support VRRPv3")
Differential Revision: https://reviews.freebsd.org/D46492
Do not report an error, if it is stored as a soft error. This avoids,
for example, the dropping of TCP connections using an interface,
while enabling or disabling LRO on that interface.
Reviewed by: cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46427
If the V_connect_ifaddr_wild sysctl says that we shouldn't infer a
destination address, return an error. Otherwise it's possible for use
of an unspecified foreign address to trigger a subsequent assertion
failure, for example in in_pcblookup_hash_locked().
Similarly, if no interface addresses are assigned, fail quickly upon an
attempt to connect to the unspecified address.
Reported by: Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after: 2 weeks
Reviewed by: zlei, allanjude, emaste
Differential Revision: https://reviews.freebsd.org/D46454
Extend what we did for netinet counters in 60d8dbbef0 (netinet: add a probe
point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters, 2024-01-18) to the
IPsec code.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D46416
Originally, a SYN-cache entry was always allocated and later freed,
when not needed anymore. Then the allocation was avoided, when no
SYN-cache entry was needed, and a copy on the stack was used.
But the logic regarding freeing was not updated.
This patch doesn't re-check conditions (which may have changed) when
deciding to insert or free the entry, but uses the result of
the earlier check.
This simplifies the code and improves also consistency.
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46410
The format for CTLTYPE_UINT is "IU" instead of "UI" as specified
in sysctl.9.
Reviewed by: cc, zlei
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46408
Use LIST_FOREACH_SAFE(), since the list element is removed from
the list in the loop body, zero out and inserted in the free list.
Reviewed by: rrs
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46383
There will at most lro_entries entries in the LRO hash table. So no
need to take lro_mbufs into account, which only results in the
LRO hash table being too large and therefore wasting memory.
Reviewed by: rrs
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46378
If UseLink() returns NULL, it is possible that Deletelink()
has already freed "grp", so check it out carefully.
PR: 269770
Reported by: Peter Much
X-MFC-With: 8132e95909
sys/netinet/libalias/alias_db.c has internal static function UseLink()
that passes a link to CleanupLink() to verify if the link has expired.
If so, UseLink() may return NULL.
_FindLinkIn()'s usage of UseLink() is not quite correct.
Assume there is "redirect_port udp" configured to forward incoming
traffic for specific port to some internal address.
Such a rule creates partially specified permanent link.
After first such packet libalias creates new fully specifiled
temporary LINK_UDP with default timeout 60 seconds.
Also, in case of low traffic libalias may assign "timestamp"
for this new temporary link way in the past because
LibAliasTime is updated seldom and can keep old value
for tens of seconds, and it will be used for the temporary link.
It may happen that next incoming packet for redirected port
passed to _FindLinkIn() results in a call to UseLink()
that returns NULL due to detected expiration.
Immediate return of NULL results in broken translation:
either a packet is dropped (deny_incoming mode) or delivered to
original destination address instead of internal one.
Fix it with additional check for NULL to proceed with a search
for original partially specified link. In case of UDP,
it also recreates temporary fully specified link
with a call to ReLink().
Practical examples are "redirect_port udp" rules for unidirectional
SYSLOG protocol (port 514) or some low volume VPN encapsulated in UDP.
Thanks to Peter Much for initial analysis and first version of a patch.
Reported by: Peter Much <pmc@citylink.dinoex.sub.org>
PR: 269770
MFC after: 1 week
Initialize V_ts_offset_secret for each vnet, not only for the
default vnet, since it is vnet specific.
Reviewed by: Peter Lei
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46246
There is a new feature that came in with the last sync to the rack stack that should not have
been released. It is untested and may not well work. It currently is off by default, which is good
but it is best to remove it until such time that it can be vetted and tuned to actually work :)
This change removes just the experimental feature for now. It can make a appearance in the future
when it is proofed out.
Reviewed by: tuexen
Differential Revision:https://reviews.freebsd.org/D45410
Use the variable in the TCPCB, not the one in the stack specific
data structure. This simplifies the code and brings the functionality
to BBR without any change.
Reviewed by: Peter Lei, cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46068
The vnet component of struct tcp_syncache is only used if
VIMAGE is defined.
No functional change intended.
Reviewed by: cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46240
Update the ddb printing of t_flags and t_flags2 to the current state of
definitions in tcp_var.h.
Reviewed by: cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46222
When the kernel is compiled with options RATELIMIT, the
mlx5en driver cannot detach. It gets stuck waiting for all
kernel users of its rates to drop to zero before finally calling
ether_ifdetach.
The tcp ratelimit code has an eventhandler for ifnet departure
which causes rates to be released. However, this is called as an
ifnet departure eventhandler, which is invoked as part of
ifdetach(), via either_ifdetach(). This means that the tcp
ratelimit code holds down many hw rates when the mlx5en driver
is waiting for the rate count to go to 0. Thus devctl detach
will deadlock on mlx5 with this stack:
mi_switch+0xcf sleepq_timedwait+0x2f _sleep+0x1a3 pause_sbt+0x77 mlx5e_destroy_ifp+0xaf mlx5_remove_device+0xa7 mlx5_unregister_device+0x78 mlx5_unload_one+0x10a remove_one+0x1e linux_pci_detach_device+0x36 linux_pci_detach+0x24 device_detach+0x180 devctl2_ioctl+0x3dc devfs_ioctl+0xbb vn_ioctl+0xca devfs_ioctl_f+0x1e kern_ioctl+0x1c3 sys_ioctl+0x10a
To fix this, provide an explicit API for a driver to call the tcp
ratelimit code telling it to detach itself from an ifnet. This
allows the mlx5 driver to unload cleanly. I considered adding an
ifnet pre-departure eventhandler. However, that would need to be
invoked by the driver, so a simple function call seemed better.
The mlx5en driver has been updated to call this function.
Reviewed by: kib, rrs
Differential Revision: https://reviews.freebsd.org/D46221
Sponsored by: Netflix
Like any other parameter, the CC algorithm should be inherited from
the listener.
Reviewed by: cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46143
The default was true and it is consistent to inherit the TCP function
block from the listener as most of the other parameters.
Reviewed by: Peter Lei, cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46141
fsn_included should only be considered, if first_frag_seen is true.
Also, fix the resetting of the control structure, if stream queues
are flushed.
This fixes a bug where a legitimate message sequence was incorrectly
classified as illegitimate.
Thanks to Victor Boivie for reporting the issue on the userland
stack.
MFC after: 3 days
The inp_route pointer should only be provided to the network
layer, when no destination address is provided. This is only
one of the conditions, where a write lock is needed.
If, for example, the route is also cached, when the socket is
unbound, problems show up, when the sendto is called, then
connect and finally send, when the route for the addresses
provided in the sendto and connect call use different outgoing
interfaces.
While there, clearly document why the write lock is taken.
Reported by: syzbot+59122d2e848087d3355a@syzkaller.appspotmail.com
Reviewed by: Peter Lei, glebius
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46056
As suggested by glebius@. While there, improve the documentation.
Reviewed by: Peter Lei, cc
MFC after: 1 week
Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D46140
As suggested by lstewart, remove the non-working SCTP support in the
TCP congestion control modules. SCTP has a similar functionality
(although not using kernel loadable modules), on which the TCP stuff
was built on, but the integration was never done.
No functional change intended.
Reviewed by: Peter Lei, cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46142
Implement ACK throttling of challenge ACKs as described in RFC 5961.
Reviewed by: Peter Lei, rscheff, cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46066
Summary:
This variable was added by commit eb5bfdd065, but unnecessarily needed.
No functional change.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D46042
The SIOCDIFADDR{,_IN6} ioctls take an ifreq structure object, not an
ifaliasreq/in_aliasreq/in6_aliasreq structure object, as their argument.
As opposed to ifaliasreq/in_aliasreq/in6_aliasreq used by
SIOCAIFADDR{,_IN6}, the ifreq/in6_ifreq structures used by the
SIOCDIFADDR{,_IN6} ioctls do not include a separate field for a
broadcast address and other values required to add an address to a
network interface with SIOCAIFADDR{,_IN6}.
Whilst this issue is not specific to CHERI-extended architectures, it
was first observed on CheriBSD running on Arm Morello. For example,
incorrect calls using the in6_aliasreq object result in CHERI capability
violations. A pointer to the ifra_addr field in in6_aliasreq cast to the
ifru_addr union member of in6_ifreq results in bounds being set to the
union's larger size. Such bounds exceed the bounds of of in6_aliasreq
object and the bounds-setting instruction clears a tag of the object's
capability.
Reviewed by: brooks, kp, oshogbo
Accepted by: oshogbo (mentor)
Reported by: CHERI
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D46016