Fix a typo, which resulted in missing r_ctl.gate_to_fs in the BBLog
event.
Reported by: Coverity Scan
CID: 1540024
Reviewed by: rrs, rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44648
RFC 9293 describes the handling of data in the CLOSE-WAIT, CLOSING,
LAST-ACK, and TIME-WAIT states:
This should not occur since a FIN has been received from the remote
side. Ignore the segment text.
Therefore, implement this handling.
Reviewed by: rrs, rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44746
struct tcpcb embeds a struct osd and a struct callout. Rather than
forcing all consumers to pull in the same headers, include the headers
directly.
No functional change intended.
Reviewed by: glebius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44685
When debugging network issues one common clue is an unexpectedly
incrementing error counter. This is helpful, in that it gives us an
idea of what might be going wrong, but often these counters may be
incremented in different functions.
Add a static probe point for them so that we can use dtrace to get
futher information (e.g. a stack trace).
For example:
dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }'
This can be disabled by setting the following kernel option:
options KDTRACE_NO_MIB_SDT
Reviewed by: gallatin, tuexen (previous version), gnn (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43504
Also log, when dropping text or FIN after having received a FIN.
This is the intended behavior described in RFC 9293.
A follow-up patch will enforce this behavior for the base stack
and the RACK stack.
Reviewed by: rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44669
When in rack_output() jumping to the label out, don't write errno into
the log buffer, since the pointer is not initialized.
Reported by: Coverity Scan
CID: 1523773
Reviewed by: rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44647
In rack_output(), idle is used as a boolean variable. So don't use it
as an int and don't clear it afterwards.
This avoids setting idle to false, when it is not intended.
Reported by: olivier
Reviewed by: rrs, rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44610
Ensure that tv.tv_sec is zero in all code paths.
Reported by: Coverity Scan
CID: 1527724
Reviewed by: rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44584
t_state is an unsigned variable, so no need for testing that it is
non-negative.
Reported by: Coverity Scan
CID: 1390885
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44619
Make the comment consistent with the code.
Reviewed by: rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44611
The target_slot argument of max_slots_available() can be NULL.
Therefore, check for this in all places.
Right now, all callers provide non-NULL pointer.
Reported by: Coverity Scan
CID: 1527732
Reviewed by: rrs
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44527
Before a protocol specific control block started to embed inpcb in self
(see 0aa120d52f, e68b379244, 483fe96511) this pointer used to point
at it.
Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>. The
exp-run detected a minimal use of the field in ports:
* sysutils/lsof - patched upstream
* net-mgmt/netdata - patch accepted upstream
* emulators/qemu-user-static - upstream master branch seems not using
the field anymore
We can keep the field around for some time, but eventually it may be
reused for something else.
PR: 277659 (exp-run)
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D44491
HPTS inserts a softclock for system call return that optimizes performance. However when
no HPTS threads need the help (i.e. when they have less than 100 or so connections) then
there should be little work done i.e. check the counter and return instead of running through
all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts
thread that is over the connection threshold.
Reported by: eduardo
Reviewed by: gallatin, glebius, tuexen
Tested by: gallatin
Differential Revision: https://reviews.freebsd.org/D44420
The length of tldl_reason is TCP_LOG_REASON_LEN, not TCP_LOG_ID_LEN.
No functional change intended.
Reported by: Coverity Scan
CID: 1418074
CID: 1418276
Reviewed by: glebius, rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44510
Most of them can be declared as static after the move out of in6_proto.c.
Keeping sysctl(9) declarations with their text descriptions next to the
variable declaration create self-documenting code. There should be no
functional changes.
Differential Revision: https://reviews.freebsd.org/D44481
Instead of fixing up invalid values set by a user in badport_bandlim()
which is a fast path function, provide a sysctl handler
sysctl_icmplim_and_jitter(), that will check that jitter is less than the
limit.
Provide jitter initilization function icmplim_new_jitter() used at boot,
in the sysctl handler and when we actually hit the limit. This also fixes
no jitter on a fresh booted system until first limit hit.
Instead of CVE number provide link the the actual paper that explains what
and why we are doing here. The CVE number isn't very informative, it will
just tell you what RedHat version you need to upgrade to.
Reviewed by: kp, tuexen, zlei
Differential Revision: https://reviews.freebsd.org/D44478
The limiting of the very last second has been done using certain jitter
value. We update the jitter for the next second. But the logging should
report the jitter before the change.
Reviewed by: kp, tuexen, zlei
Differential Revision: https://reviews.freebsd.org/D44477
The uninitialization may be executed only on a kernel with VIMAGE.
Reviewed by: kp, tuexen, zlei
Differential Revision: https://reviews.freebsd.org/D44476
We need per-VNET struct counter_rate, but we don't need per-VNET set of
const char *. Also, identical word "response" can go into the format
string instead of being stored 7 times.
Reviewed by: kp, zlei, tuexen
Differential Revision: https://reviews.freebsd.org/D44475
Ensure that there is no data on SYN segments unless doing TFO.
This check is already in RACK and BBR.
Reported by: glebius
Reviewed by: rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44384
This socket option can be used by in-kernel consumers (like NFS) to
request a NIC to use optimized receive of large buffers for a
connection. The current use case is to support DDP by the TOE on
Chelsio NICs.
Reviewed by: rscheff, tuexen, glebius
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44000
Don't report a BACKUP CARP address as local. These two functions are used
only by source address validation for input packets, controlled by sysctls
net.inet.ip.source_address_validation and
net.inet6.ip6.source_address_validation. For this purpose we definitely
want to treat BACKUP addresses as non local.
This change is conservative and doesn't modify compat in_localip() and
in6_localip(). They are used more widely than the FIB-aware versions.
The change would modify the notion of ipfw(4) 'me' keyword. There might
be other consequences as in_localip() is used by various tunneling
protocols.
PR: 277349
When a TCP callout decides to disable self, e.g. tcp_timer_2msl() calling
tcp_close(), we must also clear all other possible timers. Otherwise,
upon return, the callout would be scheduled again in tcp_timer_enter().
Revert 57e27ff07a, which was a temporary partial revert of otherwise
correct 62d47d73b7, that exposed the problem being fixed now. Add an
extra assertion in tcp_timer_enter() to check we aren't arming callout for
a closed connection.
Reviewed by: rscheff
The macro is more obfuscating than helping as it just checks a single flag
of t_flags. All other t_flags bits are checked without a macro.
A bigger problem was that declaration of the macro in tcp_var.h depended
on a kernel option. It is a bad practice to create such definitions in
installable headers.
Reviewed by: rscheff, tuexen, kib
Differential Revision: https://reviews.freebsd.org/D44362
These KPIs were added in dd0e6c383a and through 15 years had zero use.
They slightly remind what IfAPI does for struct ifnet. But IfAPI does
that for the sake of large collection of NIC drivers not being aware of
struct ifnet. For the sockets it is unclear what could be a large
collection of externally written kernel modules that need extensively use
sockets and not be aware of their internals at the same time. This
isolation of a structure knowledge requires a lot of work, and just
throwing in a few KPIs isn't helpful.
Reviewed by: kib, olce, markj
Differential Revision: https://reviews.freebsd.org/D44311
These KPIs were added in 9d29c635da and through 15 years had zero use.
They slightly remind what IfAPI does for struct ifnet. But IfAPI does
that for the sake of large collection of NIC drivers not being aware of
struct ifnet. For the inpcb it is unclear what could be a large
collection of externally written kernel modules that need extensively use
inpcb and not be aware of its internals at the same time. This isolation
of a structure knowledge requires a lot of work, and just throwing in a
few KPIs isn't helpful.
Reviewed by: kib, bz, markj
Differential Revision: https://reviews.freebsd.org/D44310
and drop the definition for userspace (which matched TCP_RFC7413) since
it depends on presence of the kernel option.
Reviewed by: glebius, rscheff
Sponsored by: NVIDIA networking
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44349
Doing a deep copy of the keys early allows users of the
tls_enable structure to assume kernel memory.
This enables the socket options to be set by kernel threads.
Reviewed By: #transport, tuexen, jhb, rrs
Sponsored by: NetApp, Inc.
X-NetApp-PR: #79
Differential Revision: https://reviews.freebsd.org/D44250
This brings the rack stack up to the current level used at NF. Many fixes
and improvements have been added. I also add in a fix to BBR to deal with
the changes that have been in hpts for a while i.e. only one call no matter
if mbuf queue or tcp_output.
It basically does little except BBlogs and is a placemark for future work on
doing path capacity measurements.
With a bit of a struggle with git I finally got rack_pcm.c into place (apologies
for not noticing this error). The LINT kernel is running on my box now .. sigh.
Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc.
Differential Revision:https://reviews.freebsd.org/D43986
This brings the rack stack up to the current level used at NF. Many fixes
and improvements have been added. I also add in a fix to BBR to deal with
the changes that have been in hpts for a while i.e. only one call no matter
if mbuf queue or tcp_output.
Note there is a new file that I can't figure out how to get in rack_pcm.c
It basically does little except BBlogs and is a placemark for future work on
doing path capacity measurements.
Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc.
Differential Revision:https://reviews.freebsd.org/D43986
When doing mbuf queueing, the packet filter hooks in ether_demux(),
ip_input(), and ip6_input() are by-passed. This means that the packet
filters don't process incoming packets, which might result in
connection failures. For example bypassing the TCP sequence number
validation will result in dropping valid packets.
Please note that this patch is only disabling mbuf queueing, not LRO.
Reported by: Herbert J. Skuhra
Reviewed by: glebius, rrs, rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D43769
Visibility into the contents of the buffer when a write(2) has failed
can be immensely useful in debugging IPC issues -- pushing this to
discuss the idea, or maybe an alternative where we can set a flag like
KTRFAC_ERRIO to enable it.
When a genio event is potentially raised after an error, currently we'll
just free the uio and return. However, such data can be useful when
debugging communication between processes to, e.g., understand what the
remote side should have grabbed before closing a pipe. Tap out the
entire buffer on failure rather than simply discarding it.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D43799
Ok lets fix up the tcp_in_hpts() so that it also says yes if you
are in the race state moving and you are scheduled to be put in.
This also requires changing the MPASS to be the old version non
inline function of tcp_in_hpts().
This change also adds a new inline macro so that a uint64_t timestamp can be
obtained by a transport (aka Rack will use this).
Reviewed by: glebius, tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D44157
This is a migitation to avoid sudden extreme jumps in
cwnd, as t_epoch can be very out of date after an RTO.
Per RFC9438, sec 4.8, t_epoch is to be reset whenever
cwnd grows beyond ssthresh (CC phase transitions from
slow start to congestion avoidance), to be fixed with
the upcoming cc_cubic changes.
MFC after: 3 days
Reviewed By: cc, #transport
Sponsored by: NetApp, Inc
Differential Revision: https://reviews.freebsd.org/D44023
Ensure that snd_fack holds a valid value when doing
the post_recovery CC processing, for preparation of
the cc_cubic update, so that local pipe calculations
can correctly refer to snd_fack during and after CC events.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43957