As suggested by lstewart, remove the non-working SCTP support in the
TCP congestion control modules. SCTP has a similar functionality
(although not using kernel loadable modules), on which the TCP stuff
was built on, but the integration was never done.
No functional change intended.
Reviewed by: Peter Lei, cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46142
Summary:
This variable was added by commit eb5bfdd065, but unnecessarily needed.
No functional change.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D46042
This is a migitation to avoid sudden extreme jumps in
cwnd, as t_epoch can be very out of date after an RTO.
Per RFC9438, sec 4.8, t_epoch is to be reset whenever
cwnd grows beyond ssthresh (CC phase transitions from
slow start to congestion avoidance), to be fixed with
the upcoming cc_cubic changes.
MFC after: 3 days
Reviewed By: cc, #transport
Sponsored by: NetApp, Inc
Differential Revision: https://reviews.freebsd.org/D44023
Facilitate easier troubleshooting by enumerating
all congestion control signals. Typecast the
enum to int, when a congestion control module uses
private signals.
No external change.
Reviewed By: glebius, tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43838
Make sure the divident is at least one. While cwnd should
never be smaller than t_maxseg, this can happen during
Path MTU Discovery, or when TCP options are considered
in other parts of the stack.
PR: 276674
MFC after: 3 days
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43797
Follow up on D43768 to properly deal with the non-default
pipe calculation. When CC_RTO is processed, the timeout
will have already pulled back snd_nxt. Further, snd_fack
is not pulled along with snd_una.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43876
per RFC5681, only adjust ssthresh on the initital
retransmission timeout. Since RTO often happens
during loss recovery, while cwnd no longer tracks
all data in flight, calculcate pipe properly.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43768
tcp_fixed_maxseg() is the streamlined calculation of typical
tcp options and more suitable for heavy use in the congestion
control modules on every received packet.
No external functional change.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43779
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
Prepare the cubic congestion control module to better align with
the specifications in RFC8312bis.
Rename a few cubic state variables to the variable names found in
the RFC8312bis specification. This makes the code more understandable
for someone reading the RFC and the code. It also makes the variable
naming convention more uniform. Add some variables needed subsequently.
No functional change.
Submitted By: Bhaskar Pardeshi, VMware Inc.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D40436
This improves TCP friendly cwnd in cases of low latency high drop rate
networks. Tests show +42% and +37% better performance in 1Gpbs and 10Gbps
cases.
Reported by: Bhaskar Pardeshi from VMware.
Reviewed By: rscheff, tuexen
Approved by: rscheff (mentor), tuexen (mentor)
When using rack, cubic and htcp will grab the srtt, but they think it is in ticks. For rack
it is in micro-seconds (which we should probably move all stacks to actually). This causes
issues so instead lets make a new interface so that any CC module can pull the srtt in
whatever granularity they want.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D40146
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints
we need to move to using the new inline functions. This adds them and moves rack to now use
the tcp_tracepoints.
Reviewed by: tuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D38831
Right now rack will fail to load due to its hack in accessing symbol names
in cc_newreno. This was fine when newreno was always compiled into the
kernel but now ... not so much. Instead lets fix up rack to use the socket
option queries to get the information it wants and set the parameters. We
also fix the CC parameter so they are always settable.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D37622
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.
The most import one is the inpcb itself. With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa. Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that. However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort. The new intotcpcb() macro is ready for such move.
The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.
One interesting side effect is that now TCP data is allocated from
SMR-protected zone. Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.
Large part of the change was done with sed script:
s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g
Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.
Differential revision: https://reviews.freebsd.org/D37127
This adds the capability for a modular congestion control
to select which variant of ECN-capable-transport it wants to use
when sending out elegible segments. As an initial CC to utilize
this, DCTCP was selected.
Event: IETF 115 Hackathon
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D24869
Without this patch, loading the RACK stack required the newreno
CC module to be compiled into the kernel. This is not the case
anymore since CUBIC is the default now.
Reviewed by: rscheff@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D36707
This changes the default TCP Congestion Control (CC) to CUBIC.
For small, transactional exchanges (e.g. web objects <15kB), this
will not have a material effect. However, for long duration data
transfers, CUBIC allocates a slightly higher fraction of the
available bandwidth, when competing against NewReno CC.
Reviewed By: tuexen, mav, #transport, guest-ccui, emaste
Relnotes: Yes
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36537
Consistently refer to the CUBIC congestion control
mechanism in uppercase throughout all comments.
No functional change.
Reviewed By: #transport, tuexen, mav, guest-ccui, emaste
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36547
In the changes to get hystart++ into cubic an inadvertent line
was removed in the conditional to figure out if you need to exit
hystart++ back to slowstart. The line of course is the most crucial
one (the others are valid but not critical) i.e. is the new rtt
less than the point where we entered hystart++. Without the line
we end up bouncing in and out of CSS.
Reported By: Reese Enghardt
Sponsored By: Netflix Inc.
In the transport call on 12/3 Gleb asked to move the CC modules towards
using reference counting to prevent folks from unloading a module in use.
It was also agreed that Michael would do a user space utility like tcp_drop
that could be used to move all connections that are using a specific CC
to some other CC.
This is the half I committed to doing, making it so that we maintain a refcount
on a cc module every time a pcb refers to it and decrementing that every
time a pcb no longer uses a cc module. This also helps us simplify the
whole unloading process by getting rid of tcp_ccunload() which munged
through all the tcb's. Instead we mark a module as being removed and
prevent further references to it. We also make sure that if a module is
marked as being removed it cannot be made as the default and also
the opposite of that, if its a default it fails and does not mark it as being
removed.
Reviewed by: Michael Tuexen, Gleb Smirnoff
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33249
As promised to the transport call on 11/4/22 here is an implementation
of hystart++ for cubic. It also cleans up the tcp_congestion function
to have a better name. Common variables are moved into the general
cc.h structure so that both cubic and newreno can use them for
hystart++
Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33035
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
I just discovered that the return of the EBUSY error was incorrectly
rigged so that you could unload a CC module that was set to default.
Its supposed to be an EBUSY error. Make it so.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33229
This causes new vnets to inherit the cc algorithm from vnet0. This is a
temporary patch to fix vnet jail creation.
With encouragement from: glebius
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Differential Revision: https://reviews.freebsd.org/D32970
Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm. Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.
Sponsored by: Netflix
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision: https://reviews.freebsd.org/D32964
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.
This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.
Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
TCP Hystart draft version -03:
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus
Is a new version of hystart that allows one to carefully exit slow start if the RTT
spikes too much. The newer version has a slower-slow-start so to speak that then
kicks in for five round trips. To see if you exited too early, if not into congestion avoidance.
This commit will add that feature to our newreno CC and add the needed bits in rack to
be able to enable it.
Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32373
* Completely initialise the CC module specific data
* Use beta_ecn in case of an ECN event whenever ABE is enabled
or it is requested by the stack.
Reviewed by: rscheff, rrs
MFC after: 3 days
Sponsored by: Netflix, Inc.
Most CC algos do use local data, and when calling
newreno_cong_signal from there, the latter misinterprets
the data as its own struct, leading to incorrect behavior.
Reported by: chengc_netapp.com
Reviewed By: chengc_netapp.com, tuexen, #transport
MFC after: 3 days
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30470
This fixes several breakages (panics) since the tcp_lro code was
committed that have been reported. Quite a few new features are
now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the
largest). There is also support for ack-war prevention. Documents
comming soon on rack..
Sponsored by: Netflix
Reviewed by: rscheff, mtuexen
Differential Revision: https://reviews.freebsd.org/D30036