Commit graph

7881 commits

Author SHA1 Message Date
Franco Fichtner
38104a2f6e dummynet: passin after dispatch
Based on a patch originally found in m0n0wall, expanded
to IPv6 and aligned with FreeBSD's IP input path.

The limit may not be correctly accounted for on the WAN
interface due to dummynet counting the packet again even
though it was already processed.

The problem here is that there's no proper way to reinject
the packet at the point where it was previously removed
from so we make the assumption that ip input was already
done (including pfil) and more or less directly move to
packet output processing.

While here move the passin label up to take the extra check
but avoiding a second label.  Also remove the spurious tag
read for forward check since we don't use it and we should
really trust the mbuf flag.
2024-06-03 11:06:53 +02:00
Marko Zec
782f020042 fib_dxr: check if cached fib_data matches the new request in dxr_init()
When calling dxr_init(), the FIB_ALGO infrastructure may provide a
pointer to a previous dxr instance, which permits reuse of auxiliary
dxr structures, i.e. incremental lookup structure updates.  For dxr this
is a crucial feature provided by FIB_ALGO, since dxr incremental updates
are typically several orders of magnitude faster than full lookup table
rebuilds.

However, the auxiliary dxr structure caches a pointer to struct fib_data and
relies upon it for performing incremental updates.  Apparently, incremental
rebuild requests from FIB_ALGO, i.e. a calls to dxr_init() with a pointer
old_data set, may (under not yet fully understood circumstances) be invoked
within a different fib_data context than the one cached in the previous
version of dxr auxiliary structures.  In such (rare) events, we ignore the
offered old dxr context, and proceed with a full lookup structure rebuild
instead of attempting an incremental one using a fib_data context which
may or may not no longer be valid, and thus lead to a system crash.

PR:		278422
MFC after:	1 week
Approved by:    re (cperciva)

(cherry picked from commit 4ab122e8ef127d36d95f874e85600c36c87c8c22)
(cherry picked from commit d6e32525c7)
2024-05-23 06:29:22 +02:00
Marko Zec
0e5e6a9419 fib_dxr: s/KASSERT/MPASS/
MFC after:	1 week
Approved by:    re (cperciva)

(cherry picked from commit 1261fc325c)
2024-05-23 06:28:30 +02:00
Marko Zec
cf879fdb48 fib_dxr: KASSERTs for chasing NULL ptr and runaway refcount suspects
MFC after:	1 week
Approved by:    re (cperciva)

(cherry picked from commit 52075e4cfa)
2024-05-23 06:28:02 +02:00
Marko Zec
47fb63a288 fib_dxr: move the bulko of malloc() failure logging into dxr_build()
Approved by:    re (cperciva)

(cherry picked from commit e474704b9c)
2024-05-23 06:27:33 +02:00
Marko Zec
4df0c59feb fib_dxr: update comment.
MFC after:	1 week
Approved by:    re (cperciva)

(cherry picked from commit df376a714a)
2024-05-23 06:27:07 +02:00
Marko Zec
78782f3dd2 fib_dxr: free() does nothing if arg is NULL, so remove a redundant check.
MFC after:	1 week
Approved by:    re (cperciva)

(cherry picked from commit 64136682ba)
2024-05-23 06:26:37 +02:00
Marko Zec
78ae540dec fib_dxr: log malloc() failures.
MFC after:	1 week
Approved by:    re (cperciva)

(cherry picked from commit e9927f4e61)
2024-05-23 06:25:27 +02:00
Marko Zec
b0a1a3138a fib_dxr: set fib_data field in struct dxr_aux early enough
Previously it was possible for dxr_build() to return with da->fd
unset in case of range_tbl or x_tbl malloc() failures.  This
may have led to NULL ptr dereferencing in dxr_change_rib_batch().

Approved by:	re (cperciva)
MFC after:	1 week

PR:		278422
(cherry picked from commit 0418d7a090)
2024-05-22 19:50:29 +02:00
Denny Page
d776dd5fbd Support ARP for 802 networks
This is used by 802.3 Ethernet.  (Also be used by 802.4 Token Bus and
802.5 Token Ring, but we don't support those.)

This was accidentally removed along with FDDI support in commit
0437c8e3b1, presumably because comments implied it was used only by
FDDI or Token Ring.

Fixes: 0437c8e3b1 ("Remove support for FDDI networks.")
Reviewed-by: emaste
Signed-off-by: Denny Page <dennypage@me.com>
Pull-request: https://github.com/freebsd/freebsd-src/pull/1166
(cherry picked from commit fcdf9a19893b9b5beb7a21407de507f0ae4c500b)
2024-04-27 10:45:22 -04:00
Randall Stewart
7fdef9cdb7 Optimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold
HPTS inserts a softclock for system call return that optimizes performance. However when
no HPTS threads need the help (i.e. when they have less than 100 or so connections) then
there should be little work done i.e. check the counter and return instead of running through
all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts
thread that is over the connection threshold.

Reported by:    eduardo
Reviewed by:    gallatin, glebius, tuexen
Tested by:      gallatin
Differential Revision: https://reviews.freebsd.org/D44420

(cherry picked from commit b7b78c1c169dd2213b4cb3e14e19c045b2c5e5af)
2024-04-24 22:37:40 +02:00
Randall Stewart
917b543145 HTPS has actually three states not two so the macro needs to account for that.
Ok lets fix up the tcp_in_hpts() so that it also says yes if you
are in the race state moving and you are scheduled to be put in.
This also requires changing the MPASS to be the old version non
inline function of tcp_in_hpts().

This change also adds a new inline macro so that a uint64_t timestamp can be
obtained by a transport (aka Rack will use this).

Reviewed by: glebius, tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D44157

(cherry picked from commit 638b5ae1c7858373344bc7b9dcb5a1e7fab80bd9)
2024-04-24 22:36:32 +02:00
Randall Stewart
d48e7e89e0 TCP: Fix a rack bug that skyzall found which results in a crash.
So when we call the fast_rsm retransmit path, we should always move
snd_nxt back up to snd_max. In fact during ack-processing if snd_nxt
falls behind it should be moved up there as well. Otherwise what
can happen is we have an incorrect mark on snd_nxt and incorrectly
calculate the offset when we go through the  front path (which is
what skzyall was able to do) then when we go to clean up the
send the offset is all wrong and we crash.

Special thanks to Gleb for pointing out the problem and the email
that had the reproducer so I could find the issue.

Reported-by: syzbot+f5061a372f74f021ec02@syzkaller.appspotmail.com
Sponsored by: Netflix Inc

(cherry picked from commit 8818f0f1124ea3d0e8028f85d667237536eba10c)
2024-04-24 22:26:27 +02:00
Mark Johnston
ca8e2e4c91 tcp: Make tcp_var.h more self-contained
struct tcpcb embeds a struct osd and a struct callout.  Rather than
forcing all consumers to pull in the same headers, include the headers
directly.

No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44685

(cherry picked from commit 1d14e88e5332cfddbec1893f6b5332f81d378d61)
2024-04-17 10:33:27 -04:00
Michael Tuexen
994f9c9e3c tcp rack: fix sending
In rack_output(), idle is used as a boolean variable. So don't use it
as an int and don't clear it afterwards.
This avoids setting idle to false, when it is not intended.

Reported by:		olivier
Reviewed by:		rrs, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44610

(cherry picked from commit 7df0ef5f48e1c67b3f1df7c7964bfa59bc56f4e4)
2024-04-17 16:19:20 +02:00
Michael Tuexen
1348844366 Revert "tcp rack: fix sending"
This reverts commit b5ee7411bf.
2024-04-17 16:17:03 +02:00
Michael Tuexen
b5ee7411bf tcp rack: fix sending
In rack_output(), idle is used as a boolean variable. So don't use it
as an int and don't clear it afterwards.
This avoids setting idle to false, when it is not intended.

Reported by:		olivier
Reviewed by:		rrs, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44610

(cherry picked from commit 7df0ef5f48e1c67b3f1df7c7964bfa59bc56f4e4)
2024-04-17 16:15:03 +02:00
Michael Tuexen
14d7784332 tcp bbr: improve code consistency
Improve code consistency with the RACK stack.
Reviewed by:		gallatin, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44800

(cherry picked from commit 605a00660eadb210ed76d49df551f3f33bbb4da7)
2024-04-17 16:02:33 +02:00
Michael Tuexen
5344772af9 tcp: add some debug output
Also log, when dropping text or FIN after having received a FIN.
This is the intended behavior described in RFC 9293.
A follow-up patch will enforce this behavior for the base stack
and the RACK stack.
Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44669

(cherry picked from commit e8c149ab85c7834f76325864f22ca89298e65f75)
2024-04-17 16:01:12 +02:00
Michael Tuexen
90cde57af8 tcp: improve consistency
No functional change intended.

Reported by:		Coverity Scan
CID:			1523781
Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44645

(cherry picked from commit 3e1c8a35f741a5d114d0ba670b15191355711fe9)
2024-04-17 16:00:09 +02:00
Michael Tuexen
c6770e8c99 tcp rack: fix memory corruption
When in rack_output() jumping to the label out, don't write errno into
the log buffer, since the pointer is not initialized.

Reported by:		Coverity Scan
CID:			1523773
Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44647

(cherry picked from commit d902c8f55b8da6902ab45e67ed756cc99f5a9d5a)
2024-04-17 15:59:05 +02:00
Michael Tuexen
6572d12cfc tcp bblog: cleanup
Remove redundant checks and improve error checking.

Reported by:		Coverity Scan
CID:			1523780
Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44616

(cherry picked from commit 60bc195745b8c1e1896c535a491906cdf11fe057)
2024-04-17 15:55:27 +02:00
Michael Tuexen
60898a7cef tcp hpts: initialize variable
Ensure that  tv.tv_sec is zero in all code paths.

Reported by:		Coverity Scan
CID:			1527724
Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44584

(cherry picked from commit aaaa01c0c858fd703194c6cbd515dd514574381f)
2024-04-17 15:54:52 +02:00
Michael Tuexen
f3e2beb306 tcp: address a warning
t_state is an unsigned variable, so no need for testing that it is
non-negative.

Reported by:		Coverity Scan
CID:			1390885
Reviewed by:		glebius
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44619

(cherry picked from commit 6b454da6bbaa3327cf9b7185d198c96ffc1b88f4)
2024-04-17 15:54:13 +02:00
Michael Tuexen
7ebfafa813 tcp: fix conversion of rttvar
A wrong variable and wrong scaling factors were used.

Reported by:		Coverity Scan
CID:			1508689
Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44612

(cherry picked from commit e0bd180130b8c95f568483d0df6abff00d7d2153)
2024-04-17 15:53:15 +02:00
Michael Tuexen
be30b22fbb tcp: fix comment
Make the comment consistent with the code.

Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44611

(cherry picked from commit 5a268d868890dbbfe96361906be20d01cc252b2f)
2024-04-17 15:52:18 +02:00
Michael Tuexen
8375db082c tcp hpts: improve consistency
The target_slot argument of max_slots_available() can be NULL.
Therefore, check for this in all places.
Right now, all callers provide non-NULL pointer.

Reported by:		Coverity Scan
CID:			1527732
Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44527

(cherry picked from commit b600644fdd6cefb1b90d76fdd5aa595946611a7d)
2024-04-17 15:51:43 +02:00
Michael Tuexen
fbc8dfd0ed tcp bblog: use correct length
The length of tldl_reason is TCP_LOG_REASON_LEN, not TCP_LOG_ID_LEN.
No functional change intended.
Reported by:		Coverity Scan
CID:			1418074
CID:			1418276
Reviewed by:		glebius, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44510

(cherry picked from commit ed505f893ab08aa61e2a1046ae54df357a108260)
2024-04-17 15:51:00 +02:00
Michael Tuexen
fd2a580db2 tcp: no data on SYN segments unless doing TFO
Ensure that there is no data on SYN segments unless doing TFO.
This check is already in RACK and BBR.

Reported by:		glebius
Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D44384

(cherry picked from commit af700f430fd86ba3eae63e587985a12436db8f69)
2024-04-17 15:50:02 +02:00
Michael Tuexen
d69099e433 TCP LRO: add dtrace probe points
Add the IP, UDP, and TCP receive static probes to the code path,
which avoids if_input.

Reviewed by:		rrs, markj
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D43727

(cherry picked from commit 96ad640178ea0a8a9d1772687659dce5be18fbd9)
2024-04-17 15:48:45 +02:00
Michael Tuexen
8244b35ff8 TCP LRO: disable mbuf queuing when packet filter hooks are in place
When doing mbuf queueing, the packet filter hooks in ether_demux(),
ip_input(), and ip6_input() are by-passed. This means that the packet
filters don't process incoming packets, which might result in
connection failures. For example bypassing the TCP sequence number
validation will result in dropping valid packets.
Please note that this patch is only disabling mbuf queueing, not LRO.

Reported by:		Herbert J. Skuhra
Reviewed by:		glebius, rrs, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D43769

(cherry picked from commit d1ce01214a5540db8a7e09fdf46b7ea2d06ffc48)
2024-04-17 15:48:01 +02:00
Michael Tuexen
be7586fe46 sctp: improve sending of packets containing an INIT ACK chunk
If the peer announced support of zero checksums, do so when sending
packets containing an INIT ACK chunk.

(cherry picked from commit 644cffe67f61ad5b36b60d621d1c630ff2a50412)
2024-04-17 15:47:17 +02:00
Michael Tuexen
49557689bf sctp: improve consistency
(cherry picked from commit 533faf21c19d0fa4bc3c0a986c67667991f90883)
2024-04-17 15:46:32 +02:00
Michael Tuexen
8b2c694d38 RACK, BBR: handle EACCES like EPERM for IP output handling
The FreeBSD TCP base stack handles them also the same way.
In case of packet filters dropping packets in the output path,
this avoids retranmitting the dropped packet every 10ms or so.

Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D43773

(cherry picked from commit 2f4e46dfdd710c6679f233480c9de430e6c4ef9b)
2024-04-17 15:44:15 +02:00
Michael Tuexen
2da45e304d TCP LRO: convert TCP header fields to host byte order earlier
This is a preparation for adding dtrace hooks in a follow-up commit,
which are missing in the code path, where packets are directly queued
to the tcpcb. The dtrace hooks expect the fields to be in host byte
order. This only applies when TCP HPTS is used.
No functional change intended.

Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D43594

(cherry picked from commit f30c7d56546b9f36e42351fb385d96e37dbac1d5)
2024-04-17 15:42:13 +02:00
Warner Losh
89dd0612d9 netinet: The tailq_hash code doesn't reference tcpoutflags
Don't define TCPOUTFLAGS to get the static definition from tcp_fsm.h.
tailq_hash.c doesn't refernce tcpoutflag. Only files that reference this
should define TCPOUTFLAGS. clang is fine with it, but gcc12 complained.

Sponsored by:		Netflix

(cherry picked from commit afd155c72bf65c056d19473569cc78c6e5807b3b)
2024-04-15 11:03:28 -07:00
John Baldwin
ce5a6a3add tcp: Add a new kernel-only TCP_USE_DDP socket option
This socket option can be used by in-kernel consumers (like NFS) to
request a NIC to use optimized receive of large buffers for a
connection.  The current use case is to support DDP by the TOE on
Chelsio NICs.

Reviewed by:	rscheff, tuexen, glebius
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44000

(cherry picked from commit 3d0a736796a99fe70be9de97beec8f10970c6905)
2024-04-12 12:25:11 -07:00
Gleb Smirnoff
d6e1ae659b carp: check CARP status in in_localip_fib(), in6_localip_fib()
Don't report a BACKUP CARP address as local.  These two functions are used
only by source address validation for input packets, controlled by sysctls
net.inet.ip.source_address_validation and
net.inet6.ip6.source_address_validation.  For this purpose we definitely
want to treat BACKUP addresses as non local.

This change is conservative and doesn't modify compat in_localip() and
in6_localip().  They are used more widely than the FIB-aware versions.
The change would modify the notion of ipfw(4) 'me' keyword.  There might
be other consequences as in_localip() is used by various tunneling
protocols.

PR:			277349
(cherry picked from commit 56f7860087eec14b4a65310b70bd704e79e1b48c)
2024-03-28 12:35:45 -07:00
Konstantin Belousov
87c7f74ff4 netinet/tcp_var.h: always define IS_FASTOPEN() for kernel compilation env
(cherry picked from commit 220ee18f196482c534a659d1eb50db26c54ca7d0)
2024-03-20 02:53:28 +02:00
Kyle Evans
160788caa3 ktrace: log genio events on failed write
Visibility into the contents of the buffer when a write(2) has failed
can be immensely useful in debugging IPC issues -- pushing this to
discuss the idea, or maybe an alternative where we can set a flag like
KTRFAC_ERRIO to enable it.

When a genio event is potentially raised after an error, currently we'll
just free the uio and return.  However, such data can be useful when
debugging communication between processes to, e.g., understand what the
remote side should have grabbed before closing a pipe.  Tap out the
entire buffer on failure rather than simply discarding it.

Reviewed by:	kib, markj

(cherry picked from commit 47ad4f2d45e406c6316909bc12bc760b2fdd6afb)
2024-03-18 10:52:58 -05:00
Gordon Bergling
6cf569e659 carp(4): Fix a typo in a source code comment
- s/successfull/successful/

(cherry picked from commit 6bce41a38e32decbce80bb1586cdd9400c83eb97)
2024-03-03 18:48:32 +01:00
Richard Scheffenegger
f3f559705a tcp: cubic - restart epoch after RTO
This is a migitation to avoid sudden extreme jumps in
cwnd, as t_epoch can be very out of date after an RTO.
Per RFC9438, sec 4.8, t_epoch is to be reset whenever
cwnd grows beyond ssthresh (CC phase transitions from
slow start to congestion avoidance), to be fixed with
the upcoming cc_cubic changes.

MFC after:		3 days
Reviewed By:		cc, #transport
Sponsored by:		NetApp, Inc
Differential Revision:	https://reviews.freebsd.org/D44023

(cherry picked from commit 038699a8f18a0a651ee06b85fa1dbbee1eab56f1)
2024-02-27 12:00:56 +01:00
Richard Scheffenegger
419848219b tcp: prevent div by zero in cc_htcp
Make sure the divident is at least one. While cwnd should
never be smaller than t_maxseg, this can happen during
Path MTU Discovery, or when TCP options are considered
in other parts of the stack.

PR:			276674
MFC after:		3 days
Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D43797

(cherry picked from commit 38983d40c18ec5705dcba19ac320b86c5efe8e7e)
2024-02-27 12:00:55 +01:00
Gordon Bergling
c71ed5dc63 sctp(4): Fix a typo in a source code comment
- s/anthing/anything/

(cherry picked from commit 2fb174d18a42d1b2965164186843540ee65881ea)
2024-02-21 08:17:54 +01:00
Mark Johnston
c8691d183d tcp: Silence a -Wunused-function warning in tcp_ratelimit.h
No functional change intended.

(cherry picked from commit 876fddc886987ddbc89c412b15874749764167ac)
2024-02-18 15:26:28 +01:00
Richard Scheffenegger
9bc48382a5 tcp: move cc_post_recovery past snd_una update
The RFC6675 pipe calculation (sack.revised, enabled
by default since D28702), uses outdated information,
while the previous default calculated it correctly
with up-to-date information from the incoming ACK.

This difference can become as large as the receive
window (not the congestion window previously),
potentially triggering a massive burst of new packets.

MFC after:             1 week
Reviewed By:           tuexen, #transport
Sponsored by:          NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43520

(cherry picked from commit 0b3f9e435f2bde9e5be27030d9f574a977a1ad47)
2024-02-01 19:13:53 +01:00
Mark Johnston
93f523ab36 netinet: Remove stale references to Giant from comments
MFC after:	1 week

(cherry picked from commit bbf86c65d04d6013fd3f7b6d74a341256c4e7336)
2024-02-03 14:10:36 -05:00
Gordon Bergling
0334b131c5 tcp_fastopen: Fix a typo in a source code comment
- s/posession/possession/

(cherry picked from commit 9b035689f15fc4aec96f9c18c6c86bd615faed2f)
2024-01-25 07:44:39 +01:00
Gordon Bergling
49e450f43d tcp_hpts: Fix a typo of a function name in a comment
- s/tcp_ouput/tcp_output/

(cherry picked from commit ef0ac0a1ad6750291b881203030384b7f7241efb)
2024-01-23 07:42:31 +01:00
John Baldwin
9c50c9b776 sys: Use mbufq_empty instead of comparing mbufq_len against 0
Reviewed by:	bz, emaste
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43338

(cherry picked from commit 8cb9b68f5821e45c63ee08d8ee3029ca523ac174)
2024-01-18 14:37:29 -08:00