flsll is inlined, or replaced by a smart binary search implementation,
on all architectures, and HAVE_INLINE_FLSLL is #defined always. So
remove code the the #undefined case.
Reviewed by: mhorne, tuexen
Differential Revision: https://reviews.freebsd.org/D40704
Improve consistency of the field names with tcpsinfo_t:
* Use mss instead of max_seg_size.
* Use lport and rport instead of tcp_localport and tcp_foreignport.
Use t_flags instead of flags to improve consistency with t_flags2.
Add laddr and raddr, since the addresses were missing when compared
to the output of siftr.
Reviewed by: cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D40834
Currently nothing prevents tcp_usr_connect() from attempting to connect
when the socket has been disconnected. At the moment, doing so triggers
an assertion in in_pcbconnect() because inp_faddr is not unspecified. I
believe this may have been caught in the past by TIMEWAIT checks, but
those are now removed.
Check for additional socket states in tcp_connect().
Reported by: syzbot+f0f7871ec5397602b446@syzkaller.appspotmail.com
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40579
guide on how to label FALLTHOUGH in switch statements.
No functional chance.
Reviewed By: tuexen, cc, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D40622
So over the past few weeks we have found several bugs and updated hybrid pacing to have
more data in the low-level logging. We have also moved more of the BBlogs to "verbose" mode
so that we don't generate a lot of the debug data unless you put verbose/debug on.
There were a couple of notable bugs, one being the incorrect passing of percentage
for reduction to timely and the other the incorrect use of 20% timely Beta instead of
80%. This also expands a simply idea to be able to pace a cwnd (fillcw) as an alternate
pacing mechanism combining that with timely reduction/increase.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D40391
Prepare the cubic congestion control module to better align with
the specifications in RFC8312bis.
Rename a few cubic state variables to the variable names found in
the RFC8312bis specification. This makes the code more understandable
for someone reading the RFC and the code. It also makes the variable
naming convention more uniform. Add some variables needed subsequently.
No functional change.
Submitted By: Bhaskar Pardeshi, VMware Inc.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D40436
Both Windows (TcpMaxDataRetransmissions) and Linux (tcp_retries2)
allow to restrict the maximum number of consecutive timer based
retransmissions. Add that same capability on a per-VNet basis to
FreeBSD.
Reviewed By: cc, tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D40424
The pfil hook may modify the packet, so before we check its length (to
decide if it needs to be fragmented or not) we should re-read that
length.
This is most likely to happen when pf is reassembling packets. In that
scenario we'd receive the last fragment, which is likely to be a short
packet, pf would reassemble it (likely exceeding the interface MTU) and
then we'd transmit it without fragmenting, because we're comparing the
MTU to the length of the last fragment, not the fully reassembled
packet.
See also: https://redmine.pfsense.org/issues/14396
Reviewed by: cy
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40395
Ensure that a user specified value of TTL/hoplimit and DSCP is
used when sending packets.
Reviewed by: cc, rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D40423
This improves TCP friendly cwnd in cases of low latency high drop rate
networks. Tests show +42% and +37% better performance in 1Gpbs and 10Gbps
cases.
Reported by: Bhaskar Pardeshi from VMware.
Reviewed By: rscheff, tuexen
Approved by: rscheff (mentor), tuexen (mentor)
Having it configurable adds more flexibility, especially
for the systems with low amount of memory.
Additionally, it allows to speedup frag6/ tests execution.
Reviewed by: kp, markj, bz
Differential Revision: https://reviews.freebsd.org/D35755
MFC after: 2 weeks
Refactor tcp_get_srtt() into its two component operations: unit
conversion and shifting. No functional change is intended.
Reviewed by: cc, tuexen
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D40304
Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter
rules to change the destination address and port for a connection.
Typically, the rule triggers on an input event when a packet is received
by a router and the destination address and/or port is changed to
implement the redirect. When a reply packet on this connection is output
to the network, the rule triggers again, reversing the modification.
When the connection is initiated on the same host as the packet filter,
it is initially output via lo0 which queues it for input processing.
This causes an input event on the lo0 interface, allowing redirect
processing to rewrite the destination and create state for the
connection. However, when the reply is received, no corresponding output
event is generated; instead, the packet is delivered to the higher level
protocol (e.g. tcp or udp) without reversing the redirect, the reply is
not matched to the connection and the packet is dropped (for tcp, a
connection reset is also sent).
This commit fixes the problem by adding a second packet filter call in
the input path. The second call happens right before the handoff to
higher level processing and provides the missing output event to allow
the redirect's reply processing to perform its rewrite. This extra
processing is disabled by default and can be enabled using pfilctl:
pfilctl link -o pf:default-out inet-local
pfilctl link -o pf:default-out6 inet6-local
PR: 268717
Reviewed-by: kp, melifaro
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40256
When looking up a listening socket, the SMR-protected lookup routine may
return a jailed socket with no local address. This happens when using
classic jails with more than one IP address; in a single-IP classic
jail, a bound socket's local address is always rewritten to be that of
the jail.
After commit 7b92493ab1, the lookup path failed to check whether the
jail corresponding to a matched wildcard socket actually owns the
address, and would return the match regardless. Restore the omitted
checks.
Fixes: 7b92493ab1 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
Reported by: peter
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D40268
Summary:
(1) use inp_flowid or a new packet hash for a flow identification
(2) cache constant connection info into struct flow_hash_node
(3) use compressed notation for IPv6 address representation
Reviewers: rscheff, tuexen
Approved by: tuexen (mentor)
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D40302
This change is a name change only. TCP Request tracking can track sendfile and even non-sendfile requests. The
names however in the current code use http, and they should not. The feature is not http specific. Lets change the
name so they more properly reflect whats going on. This also fixes conflicts with http_req which caused application pain.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D40229
When using rack, cubic and htcp will grab the srtt, but they think it is in ticks. For rack
it is in micro-seconds (which we should probably move all stacks to actually). This causes
issues so instead lets make a new interface so that any CC module can pull the srtt in
whatever granularity they want.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D40146
Summary:
This brings some benefit of a tcp flow identification for some kernel
modules, such as siftr.
Reviewers: rrs, rscheff, tuexen, #transport!
Approved by: tuexen (mentor), rrs
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D40061
If the peer6 address is a link-local address we have to embed the
scopeid, much like we have to for IPv6 multicast as well.
Sponsored by: Rubicon Communications, LLC ("Netgate")
After removing the -FreeBSD and -NetBSD, we're left with a nuber of
BSD-2-Clause AND BSD-2-Clause, so tidy that up.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
The SPDX folks have obsoleted the BSD-2-Clause-NetBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
When the TCP_LOG option is used to enable logging on a listening
socket, inherit this if the listener is not auto selected and does
not have a log id set.
Reviewed by: cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D38436
Log all errors for PRUs, except when INP_DROPPED is set. In that case,
don't log it.
Reviewed by: glebius, rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D39591
When shutdown(..., SHUT_WR) is called in the front states, send a
SHUTDOWN chunk when a COOKIE ACK chunk is received and there is
no outstanding data.
MFC after: 1 week
* If a measure of staleness of 0 is reported, use the RTT instead.
* Ensure that we always send a cookie preservative parameter by
rounding up during the calculation.
* If allowed, perform a round trip time measurement.
* Clear the overall error counter, since the error cause also
acts like an ACK.
MFC after: 1 week
These flags are TCP specific. While here, make also several LRO
internal functions to pass tcpcb pointer instead of inpcb one.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39698
This makes inpcb lighter and allows future cache line optimizations
of tcpcb. The reason why HPTS originally used inpcb is the compressed
TIME-WAIT state (see 0d7445193a), that used to free a tcpcb, while the
associated connection is still on the HPTS ring.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39697
The purge was intentionally removed in a540cdca31. My assumption
was that the stacks that use the input queue always call the
tcp_handle_orphaned_packets() in their tfb_tcp_fb_fini method.
However, rack will skip doing that if t_fb_ptr is NULL and there are
scenarios when it is NULL, e.g. close(2) on a socket (but some
special close(2)). Instead of working out all possible scenarios
let's put this safebelt back.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39696
Packet Mark is an analogue to ipfw tags with O(1) lookup from mbuf while
regular tags require a single-linked list traversal.
Mark is a 32-bit number that can be looked up in a table
[with 'number' table-type], matched or compared with a number with optional
mask applied before comparison.
Having generic nature, Mark can be used in a variety of needs.
For example, it could be used as a security group: mark will hold a security
group id and represent a group of packet flows that shares same access
control policy.
Reviewed By: pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D39555
MFC after: 1 month