Commit graph

6928 commits

Author SHA1 Message Date
Mark Johnston
668a555de6 rip: Add missing minimum length validation in rip_output()
If the socket is configured such that the sender is expected to supply
the IP header, then we need to verify that it actually did so.

Reported by:	syzkaller+KMSAN
Reviewed by:	donner
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit ba21825202)
2021-08-02 15:01:11 -04:00
Richard Scheffenegger
e4ee2a39ad tcp: Add PRR cwnd reduction for non-SACK loss
This completes PRR cwnd reduction in all circumstances
for the base TCP stack (SACK loss recovery, ECN window reduction,
non-SACK loss recovery), preventing the arriving ACKs to
clock out new data at the old, too high rate. This
reduces the chance to induce additional losses while
recovering from loss (during congested network conditions).

For non-SACK loss recovery, each ACK is assumed to have
one MSS delivered. In order to prevent ACK-split attacks,
only one window worth of ACKs is considered to actually
have delivered new data.

MFC after: 6 weeks
Reviewed By: rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29441

(cherry picked from commit 74d7fc8753)
2021-08-02 13:59:23 +02:00
Kristof Provost
c3d03672e1 pf: syncookie support
Import OpenBSD's syncookie support for pf. This feature help pf resist
TCP SYN floods by only creating states once the remote host completes
the TCP handshake rather than when the initial SYN packet is received.

This is accomplished by using the initial sequence numbers to encode a
cookie (hence the name) in the SYN+ACK response and verifying this on
receipt of the client ACK.

Reviewed by:	kbowling
Obtained from:	OpenBSD
MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31138

(cherry picked from commit 8e1864ed07)
2021-07-27 09:42:25 +02:00
Michael Tuexen
9b1219b24a tcp: fix RACK and BBR when using VIMAGE enabled kernel
Fix a bug in VNET handling, which occurs when using specific NICs.
PR:			257195
Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D31212

(cherry picked from commit a730d82378)
2021-07-22 11:13:31 +02:00
Stefan Eßer
791035c8da libalias: fix divide by zero causing panic
The packet_limit can fall to 0, leading to a divide by zero abort in
the "packets % packet_limit".

An possible solution would be to apply a lower limit of 1 after the
calculation of packet_limit, but since any number modulo 1 gives 0,
the more efficient solution is to skip the modulo operation for
packet_limit <= 1.

Reported by:	Karl Denninger <karl@denninger.net>

(cherry picked from commit 58080fbca0)
2021-07-14 13:49:21 +02:00
Andrew Gallatin
7751a6b585 tcp: fix alternate stack build with LINT-NO{INET,INET6,IP}
When fixing another bug, I noticed that the alternate
TCP stacks do not build when various combinations of
ipv4 and ipv6 are disabled.

Reviewed by:		rrs, tuexen
Differential Revision:	https://reviews.freebsd.org/D31094
Sponsored by:		 Netflix

(cherry picked from commit b1e806c0ed)
2021-07-13 22:00:50 +02:00
Randall Stewart
1bb521ab7d tcp: Fix 32 bit platform breakage
This fixes the incorrect use of a sysctl add to u64. It
was for a useconds time, but on 32 bit platforms its
not a u64. Instead use the long directive.

Reviewed by:		tuexen
Sponsored by:		Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D31107

(cherry picked from commit 7312e4e5cf)
2021-07-13 21:59:50 +02:00
Randall Stewart
deb3c279d1 tcp: HPTS performance enhancements
HPTS drives both rack and bbr, and yet there have been many complaints
about performance. This bit of work restructures hpts to help reduce CPU
overhead. It does this by now instead of relying on the timer/callout to
drive it instead use user return from a system call as well as lro flushes
to drive hpts. The timer becomes a backstop that dynamically adjusts
based on how "late" we are.

Reviewed by:		tuexen, glebius
Sponsored by:		Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D31083

(cherry picked from commit d7955cc0ff)
2021-07-13 21:58:30 +02:00
Randall Stewart
3b2aeae726 tcp: Address goodput and TLP edge cases.
There are several cases where we make a goodput measurement and we are running
out of data when we decide to make the measurement. In reality we should not make
such a measurement if there is no chance we can have "enough" data. There is also
some corner case TLP's that end up not registering as a TLP like they should, we
fix this by pushing the doing_tlp setup to the actual timeout that knows it did
a TLP. This makes it so we always have the appropriate flag on the sendmap
indicating a TLP being done as well as count correctly so we make no more
that two TLP's.

In addressing the goodput lets also add a "quality" metric that can be viewed via
blackbox logs so that a casual observer does not have to figure out how good
of a measurement it is. This is needed due to the fact that we may still make
a measurement that is of a poorer quality as we run out of data but still have
a minimal amount of data to make a measurement.

Reviewed by:		tuexen
Sponsored by: 		Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D31076

(cherry picked from commit e834f9a44a)
2021-07-13 21:57:20 +02:00
Randall Stewart
2e1fdc728b tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
Hardware TLS is now supported in some interface cards and it works well. Except that
when we have connections that retransmit a lot we get into trouble with all the retransmits.
This prep step makes way for change that Drew will be making so that we can "kick out" a
session from hardware TLS.

Reviewed by:		tuexen, gallatin
Sponsored by: 		Netflix Inc
Differential Revision: 	https://reviews.freebsd.org/D30895

(cherry picked from commit 9e4d9e4c4d)
2021-07-13 21:56:06 +02:00
Andrew Gallatin
1843b06dee tcp: enter network epoch when calling tfb_tcp_fb_fini
We need to enter the network epoch when calling into
tfb_tcp_fb_fini.  I noticed this when I hit an assert
running the latest rack

Differential Revision: https://reviews.freebsd.org/D30407
Reviewed by: rrs, tuexen
Sponsored by: Netflix

(cherry picked from commit 086a35562f)
2021-07-13 21:54:43 +02:00
Randall Stewart
648c68168c tcp: Rack not being very friendly with V6:4 socket and having a connection from V4
There were two bugs that prevented V4 sockets from connecting to
a rack server running a V4/V6 socket. As well as a bug that stops the
mapped v4 in V6 address from working.

Reviewed by: 		tuexen
Sponsored by: 		Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30885
PR:			256657
(cherry picked from commit 66aec14a53)
2021-07-13 20:51:55 +02:00
Michael Tuexen
a3f96f1e57 sctp: Fix errno in case of association setup failures
Do not report always ETIMEDOUT, but only when appropriate. In
other cases report ECONNABORTED.

(cherry picked from commit 105b68b42d)
2021-07-13 20:30:57 +02:00
Michael Tuexen
9e8d5ea2b5 sctp: provide consistent stream information in case of early errors
While there, make sure the function is called correctly.

(cherry picked from commit ce64352a70)
2021-07-13 20:30:21 +02:00
Michael Tuexen
7f21500f49 sctp: provide sac_error also for ABORT chunk being sent
Thanks to Florent Castelli for bringing this issue up for the
userland stack and providing an initial patch.

(cherry picked from commit 84992a3251)
2021-07-13 20:29:33 +02:00
Michael Tuexen
6a4f29a3c4 sctp: initialize sequence numbers for ECN correctly
Reported by:	Junseok Yang (for the userland stack)

(cherry picked from commit c7f048ab35)
2021-07-13 20:28:48 +02:00
Michael Tuexen
24df96b642 sctp: Fix length check for ECNE chunks
(cherry picked from commit 6587a2bd1e)
2021-07-13 20:27:58 +02:00
Michael Tuexen
c5ba872129 tcp: tolerate missing timestamps
Some TCP stacks negotiate TS support, but do not send TS at all
or not for keep-alive segments. Since this includes modern widely
deployed stacks, tolerate the violation of RFC 7323 per default.

Reviewed by:		rgrimes, rrs, rscheff
Differential Revision:	https://reviews.freebsd.org/D30740
Sponsored by:		Netflix, Inc.

(cherry picked from commit 870af3f4dc)
2021-07-13 20:24:09 +02:00
Lutz Donnerhacke
3a96a25da8 libalias: Switch to SPLAY trees
Current data structure is using a hash of unordered lists.  Those
unordered lists are quite efficient, because the least recently
inserted entries are most likely to be used again.  In order to avoid
long search times in other cases, the lists are hashed into many
buckets.  Unfortunatly a search for a miss needs an exhaustive
inspection and a careful definition of the hash.

Splay trees offer a similar feature - almost O(1) for access of the
least recently used entries), and amortized O(ln(n) - for almost all
other cases.  Get rid of the hash.

Now the data structure should able to quickly react to external
packets without eating CPU cycles for breakfast, preventing a DoS.

PR:		192888
Discussed with:	Dimitry Luhtionov
Differential Revision: https://reviews.freebsd.org/D30516
Differential Revision: https://reviews.freebsd.org/D30536
Differential Revision: https://reviews.freebsd.org/D30844

(cherry picked from commit 935fc93af1)
(cherry picked from commit d261e57dea)
(cherry picked from commit f70c98a2f5)
(cherry picked from commit 25392fac94)
(cherry picked from commit 2f4d91f9cb)
(cherry picked from commit 4060e77f49)
2021-07-06 08:55:53 +02:00
Lutz Donnerhacke
78d515b222 libalias: Restructure
Clean up the database handling in order to switch to more efficient
data structures.  The development of this patch was artificially split
in to many small steps to ease reviewing.

- Common search terms
- Separate fully qualified search
- Separate table for partial links
- Cleanup _FindLinkIn
- Factor out the outgoing search function
- Factor out a common idiom to return found links
- Reorder incoming links by grouping of common search terms
- Remove LSNAT from outgoing search
- Group internal structure semantically
- Separate table for PPTP
- Use AliasRange instead of PORT_BASE
- Remove temporary state deleteAllLinks from global struct
- Avoid uninitialized expiration

Discussed with:	Dimitry Luhtionov
Differential Revision: https://reviews.freebsd.org/D30568
Differential Revision: https://reviews.freebsd.org/D30569
Differential Revision: https://reviews.freebsd.org/D30570
Differential Revision: https://reviews.freebsd.org/D30571
Differential Revision: https://reviews.freebsd.org/D30572
Differential Revision: https://reviews.freebsd.org/D30573
Differential Revision: https://reviews.freebsd.org/D30574
Differential Revision: https://reviews.freebsd.org/D30575
Differential Revision: https://reviews.freebsd.org/D30580
Differential Revision: https://reviews.freebsd.org/D30581
Differential Revision: https://reviews.freebsd.org/D30604
Differential Revision: https://reviews.freebsd.org/D30582

(cherry picked from commit d41044ddfd)
(cherry picked from commit 32f9c2ceb3)
(cherry picked from commit cac129e603)
(cherry picked from commit 19dcc4f225)
(cherry picked from commit d541903438)
(cherry picked from commit d4ab07d2ae)
(cherry picked from commit 492d3b7109)
(cherry picked from commit 7b44ff4c52)
(cherry picked from commit 1178dda53d)
(cherry picked from commit 9efcad61d8)
(cherry picked from commit fe83900f9f)
(cherry picked from commit d989935b5b)
(cherry picked from commit b50a4dce18)
(cherry picked from commit f284553444)
2021-07-06 08:55:53 +02:00
Lutz Donnerhacke
390866d47e libalias: Promote per instance global variable timeStamp
Summary:
- Use LibAliasTime as a real global variable for central timekeeping.
- Reduce number of syscalls in user space considerably.
- Dynamically adjust the packet counters to match the second resolution.
- Only check the first few packets after a time increase for expiry.

Discussed with:	hselasky
Differential Revision: https://reviews.freebsd.org/D30566

(cherry picked from commit ef828d39be)
2021-07-06 08:55:53 +02:00
Lutz Donnerhacke
69965155a5 libalias: Stats are unsigned
Stats counters are used as unsigned valued (i.e. printf("%u")) but are
defined as signed int.  This causes trouble later, so fix it early.

Differential Revision: https://reviews.freebsd.org/D30587

(cherry picked from commit 3fd20a79e7)
2021-07-06 08:55:52 +02:00
Lutz Donnerhacke
3423d44cd1 libalias: tidy up housekeeping
Replace current expensive, but sparsly called housekeeping
by a single, repetive action.

This is part of a larger restructure of libalias in order to switch to
more efficient data structures.  The whole restructure process is
split into 15 reviews to ease reviewing.  All those steps will be
squashed into a single commit for MFC in order to hide the
intermediate states from production systems.

Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D30277

(cherry picked from commit 294799c6b0)
2021-07-06 08:55:52 +02:00
Mark Johnston
d77e57f125 Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros
This makes it easier to change the socket locking protocols.  No
functional change intended.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit a100217489)
2021-06-21 09:14:48 -04:00
Mark Johnston
46d8116cae Consistently use the SOLISTENING() macro
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly.  As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING().  No functional change intended.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit f4bb1869dd)
2021-06-21 09:14:40 -04:00
Marko Zec
4715d948c5 Introduce DXR as an IPv4 longest prefix matching / FIB module
DXR maintains compressed lookup structures with a trivial search
procedure.  A two-stage trie is indexed by the more significant bits of
the search key (IPv4 address), while the remaining bits are used for
finding the next hop in a sorted array.  The tradeoff between memory
footprint and search speed depends on the split between the trie and
the remaining binary search.  The default of 20 bits of the key being
used for trie indexing yields good performance (see below) with
footprints of around 2.5 Bytes per prefix with current BGP snapshots.

Rebuilding lookup structures takes some time, which is compensated for by
batching several RIB change requests into a single FIB update, i.e. FIB
synchronization with the RIB may be delayed for a fraction of a second.
RIB to FIB synchronization, next-hop table housekeeping, and lockless
lookup capability is provided by the FIB_ALGO infrastructure.

DXR works well on modern CPUs with several MBytes of caches, especially
in VMs, where is outperforms other currently available IPv4 FIB
algorithms by a large margin.

Reviewed by:	melifaro
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D29821

(cherry picked from commit 2aca58e16f)
2021-06-17 12:07:05 +02:00
Zhenlei Huang
7da8312f7b Do not forward datagrams originated by link-local addresses
The current implement of ip_input() reject packets destined for
169.254.0.0/16, but not those original from 169.254.0.0/16 link-local
addresses.

Fix to fully respect RFC 3927 section 2.7.

PR:		255388
Reviewed by:	donner, rgrimes, karels
Differential Revision:	https://reviews.freebsd.org/D29968
Reviewed by:	rgrimes, donner, karels, marcus, emaste
Differential Revision: https://reviews.freebsd.org/D30374

(cherry picked from commit 3d846e4822)
(cherry picked from commit 03b0505b8f)
2021-06-17 10:08:59 +02:00
Randall Stewart
d0eaf95edc tcp: Missing mfree in rack and bbr
Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and
then not including a timestamp. This involved in the input path doing a goto done_with_input
label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN.
This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this
missing m_freem() but rack did not). This then caused the missing m_freem to show
up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs
later (after processing) is not a good idea, even though its only for logging. Best to
copy that off before any frees can take place.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30727

(cherry picked from commit ba1b3e48f5)
2021-06-14 23:00:17 +02:00
Randall Stewart
8ecbecdcfd tcp: Mbuf leak while holding a socket buffer lock.
When running at NF the current Rack and BBR changes with the recent
commits from Richard that cause the socket buffer lock to be held over
the ip_output() call and then finally culminating in a call to tcp_handle_wakeup()
we get a lot of leaked mbufs. I don't think that this leak is actually caused
by holding the lock or what Richard has done, but is exposing some other
bug that has probably been lying dormant for a long time. I will continue to
look (using his changes) at what is going on to try to root cause out the issue.

In the meantime I can't leave the leaks out for everyone else. So this commit
will revert all of Richards changes and move both Rack and BBR back to just
doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.

We may want to look at adding back in Richards changes after I have pinpointed
the root cause of the mbuf leak and fixed it.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30704

(cherry picked from commit 67e892819b)
2021-06-14 22:51:42 +02:00
Randall Stewart
2071c3fb0d tcp: LRO timestamps have lost their previous precision
Recently we had a rewrite to tcp_lro.c that was tested but one subtle change
was the move to a less precise timestamp. This causes all kinds of chaos
in tcp's that do pacing and needs to be fixed to use the more precise
time that was there before.

Reviewed by: mtuexen, gallatin, hselasky
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30695

(cherry picked from commit b45daaea95)
2021-06-14 22:49:27 +02:00
Michael Tuexen
82f75079f1 tcp: fix two bugs in new reno
* Completely initialise the CC module specific data
* Use beta_ecn in case of an ECN event whenever ABE is enabled
  or it is requested by the stack.

Reviewed by:		rscheff, rrs
Sponsored by:		Netflix, Inc.

(cherry picked from commit fa3746be42)
2021-06-14 01:29:14 +02:00
Michael Tuexen
fce16041a8 tcp: remove debug output from RACK
Reported by:		iron.udjin@gmail.com, Marek Zarychta
Reviewed by:		rrs
PR:			256538
Differential Revision:	https://reviews.freebsd.org/D30723
Sponsored by:		Netflix, Inc.

(cherry picked from commit f1536bb538)
2021-06-14 01:28:19 +02:00
Michael Tuexen
7a2030a106 tcp: fix compilation of IPv4-only builds
PR:			256538
Reported by:		iron.udjin@gmail.com
Sponsored by:		Netflix, Inc.

(cherry picked from commit 224cf7b35b)
2021-06-14 01:27:17 +02:00
Hans Petter Selasky
ca81bcbbf1 Add missing chunks after cherry-picking 9ca874cf74
from main to stable/13: Add TCP LRO support for VLAN and VxLAN.

Make sure all counters are allocated.

This is a direct commit.

Reported by:	Herbert J. Skuhra <herbert@gojira.at>
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-06-11 15:51:05 +02:00
Michael Tuexen
71b88ee39f tcp: fix a RACK socket buffer lock issue
Fix a missing socket buffer unlocking of the socket receive buffer.

Reviewed by:		gallatin, rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D30402

(cherry picked from commit 9bbd1a8fcb)
2021-06-09 12:51:50 +02:00
Richard Scheffenegger
0230e6cf56 rack: honor prior socket buffer lock when doing the upcall
While partially reverting D24237 with D29690, due to introducing some
unintended effects for in-kernel TCP consumers, the preexisting lock
on the socket send buffer was not considered properly.

Found by: markj
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30390

(cherry picked from commit 3975688563)
2021-06-09 12:51:36 +02:00
Richard Scheffenegger
55cc0a4785 [tcp] Keep socket buffer locked until upcall
r367492 would unlock the socket buffer before eventually calling the upcall.
This leads to problematic interaction with NFS kernel server/client components
(MP threads) accessing the socket buffer with potentially not correctly updated
state.

Reported by: rmacklem
Reviewed By: tuexen, #transport
Tested by: rmacklem, otis
MFC after: 2 weeks
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29690

(cherry picked from commit 032bf749fd)
2021-06-09 12:51:19 +02:00
Randall Stewart
fc53b7269f tcp: A better fix for the previously attempted fix of the ack-war issue with tcp.
So it turns out that my fix before was not correct. It ended with us failing
some of the "improved" SYN tests, since we are not in the correct states.
With more digging I have figured out the root of the problem is that when
we receive a SYN|FIN the reassembly code made it so we create a segq entry
to hold the FIN. In the established state where we were not in order this
would be correct i.e. a 0 len with a FIN would need to be accepted. But
if you are in a front state we need to strip the FIN so we correctly handle
the ACK but ignore the FIN. This gets us into the proper states
and avoids the previous ack war.

I back out some of the previous changes but then add a new change
here in tcp_reass() that fixes the root cause of the issue. We still
leave the rack panic fixes in place however.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30627

(cherry picked from commit 4747500dea)
2021-06-09 02:19:47 +02:00
Randall Stewart
eb91abb4ba tcp: When we have an out-of-order FIN we do want to strip off the FIN bit.
The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr).
However there was a missing case, i.e. where we get an out-of-order FIN by itself.
In such a case we don't want to leave the FIN bit set, otherwise we will do the
wrong thing and ack the FIN incorrectly. Instead we need to go through the
tcp_reasm() code and that way the FIN will be stripped and all will be well.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30497

(cherry picked from commit 8c69d988a8)
2021-06-09 02:18:36 +02:00
Randall Stewart
362f95f528 tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back
a long time when delayed ack timers were longer and links were slower. A
200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that
lowering this value to something more in line with todays delayed ack values (40ms)
might improve TCP. This bit of code makes it so rack can, via a socket option,
adjust the timer slop.

Reviewed by: mtuexen
Sponsered by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30249

(cherry picked from commit 4f3addd94b)
2021-06-09 02:16:00 +02:00
Randall Stewart
12e181b672 tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.

Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30451

(cherry picked from commit 13c0e198ca)
2021-06-09 02:13:32 +02:00
Randall Stewart
e99fa57b98 tcp: Fix an issue with the PUSH bit as well as fill in the missing mtu change for fsb's
The push bit itself was also not actually being properly moved to
the right edge. The FIN bit was incorrectly on the left edge. We
fix these two issues as well as plumb in the mtu_change for
alternate stacks.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30413

(cherry picked from commit 631449d5d0)
2021-06-09 02:12:21 +02:00
Michael Tuexen
6264ff9bd9 tcp: Handle stack switch while processing socket options
Handle the case where during socket option processing, the user
switches a stack such that processing the stack specific socket
option does not make sense anymore. Return an error in this case.

Reviewed by:		markj
Reported by:		syzbot+a6e1d91f240ad5d72cd1@syzkaller.appspotmail.com
Sponsored by:		Netflix, Inc.
Differential revision:	https://reviews.freebsd.org/D30395

(cherry picked from commit 8923ce6304)
2021-06-09 02:09:37 +02:00
Michael Tuexen
56aeedd2fd tcp: Fix sending of TCP segments with IP level options
When bringing in TCP over UDP support in
https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605,
the length of IP level options was considered when locating the
transport header. This was incorrect and is fixed by this patch.

X-MFC with:		https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605
Reviewed by:		markj, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D30358

(cherry picked from commit 500eb6dd80)
2021-06-09 02:06:26 +02:00
Randall Stewart
3a3bba7df5 tcp: Incorrect KASSERT causes a panic in rack
Skyzall found an interesting panic in rack. When a SYN and FIN are
both sent together a KASSERT gets tripped where it is validating that
a mbuf pointer is in the sendmap. But a SYN and FIN often will not
have a mbuf pointer. So the fix is two fold a) make sure that the
SYN and FIN split the right way when cloning an RSM SYN on left
edge and FIN on right. And also make sure the KASSERT properly
accounts for the case that we have a SYN or FIN so we don't
panic.

Reviewed by: mtuexen
Sponsored by: Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D30241

(cherry picked from commit 02cffbc250)
2021-06-09 02:05:15 +02:00
Michael Tuexen
6170c93c03 tcp rack: improve initialisation of retransmit timeout
When the TCP is in the front states, don't take the slop variable
into account. This improves consistency with the base stack.

Reviewed by:		rrs@
Differential Revision:	https://reviews.freebsd.org/D30230
Sponsored by:		Netflix, Inc.

(cherry picked from commit 251842c639)
2021-06-09 02:02:39 +02:00
Randall Stewart
ecfc25f05b tcp: In rack, we must only convert restored rtt when the hostcache does restore them.
Rack now after the previous commit is very careful to translate any
value in the hostcache for srtt/rttvar into its proper format. However
there is a snafu here in that if tp->srtt is 0 is the only time that
the HC will actually restore the srtt. We need to then only convert
the srtt restored when it is actually restored. We do this by making
sure it was zero before the call to cc_conn_init and it is non-zero
afterwards.

Reviewed by:	Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30213

(cherry picked from commit 4b86a24a76)
2021-06-09 02:01:32 +02:00
Randall Stewart
87cf5dcc33 tcp:Host cache and rack ending up with incorrect values.
The hostcache up to now as been updated in the discard callback
but without checking if we are all done (the race where there are
more than one calls and the counter has not yet reached zero). This
means that when the race occurs, we end up calling the hc_upate
more than once. Also alternate stacks can keep there srtt/rttvar
in different formats (example rack keeps its values in microseconds).
Since we call the hc_update *before* the stack fini() then the
values will be in the wrong format.

Rack on the other hand, needs to convert items pulled from the
hostcache into its internal format else it may end up with
very much incorrect values from the hostcache. In the process
lets commonize the update mechanism for srtt/rttvar since we
now have more than one place that needs to call it.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30172

(cherry picked from commit 9867224bab)
2021-06-09 02:00:27 +02:00
Randall Stewart
4651125ac6 This takes Warners suggested approach to making it so that
platforms that for whatever reason cannot include the RATELIMIT option
can still work with rack. It adds two dummy functions that rack will
call and find out that the highest hw supported b/w is 0 (which
kinda makes sense and rack is already prepared to handle).

Reviewed by: Michael Tuexen, Warner Losh
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30163

(cherry picked from commit 5a4333a537)
2021-06-09 01:59:21 +02:00
Randall Stewart
f5d0badc70 Fix a UDP tunneling issue with rack. Basically there are two
issues.
A) Not enough hdrlen was being calculated when a UDP tunnel is
   in place.
and
B) Not enough memory is allocated in racks fsb. We need to
   overbook the fsb to include a udphdr just in case.

Submitted by: Peter Lei
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30157

(cherry picked from commit a16cee0218)
2021-06-09 01:58:06 +02:00