Commit graph

7424 commits

Author SHA1 Message Date
Dimitry Andric
57cdd13d07 Suppress unused variable warning in tcp_stacks's rack.c
With clang 15, the following -Werror warning is produced:

    sys/netinet/tcp_stacks/rack.c:17405:12: error: variable 'outstanding' set but not used [-Werror,-Wunused-but-set-variable]
                    uint32_t outstanding;
                             ^

The 'outstanding' variable was used later in the rack_output() function,
but refactoring in 35c7bb3407 removed the usage. To avoid too much
code churn, mark the variable unused to supress the warning.

MFC after:	3 days
2022-08-14 21:27:35 +02:00
Dimitry Andric
e967183cb0 Fix unused variable warning in tcp_stacks's rack.c
With clang 15, the following -Werror warning is produced:

    sys/netinet/tcp_stacks/rack.c:16148:6: error: variable 'cnt_thru' set but not used [-Werror,-Wunused-but-set-variable]
            int cnt_thru = 1;
                ^

The 'cnt_thru' variable is only used when TCP_ACCOUNTING is defined.
Ensure it is only declared and set in that case.

MFC after:	3 days
2022-08-14 21:27:34 +02:00
Dimitry Andric
7624896571 Fix unused variable warning in tcp_stacks's bbr.c
With clang 15, the following -Werror warning is produced:

sys/netinet/tcp_stacks/bbr.c:11925:11: error: variable 'rtr_cnt' set but not used [-Werror,-Wunused-but-set-variable]
        uint32_t rtr_cnt = 0;
                 ^

The 'rtr_cnt' variable was in bbr.c when it was first added, but it
appears to have been a debugging aid that has never been used, so remove
it.

MFC after:	3 days
2022-08-14 21:27:34 +02:00
Andrew Gallatin
2c6ff1d632 LRO: fix BPF filters for lagg in the hpts path
When in the hpts path, we need to handle BPF filters since aggregated
packets do not pass up the stack in the normal way. This is already
done for most interfaces, but lagg needs special handling. This is
because packets received via a lagg are passed up the stack with
the leaf interface's ifp stored in m_pkthdr.rcvif.

To handle lagg packets, we must identify that the passed rcvif is
currently a lagg port by checking for IFT_IEEE8023ADLAG or
IFT_INFINIBANDLAG (since lagg changes the lagg port's type to that
when an interface becomes a lagg member). Then we need to find the
lagg's ifp, and handle any BPF listeners on the lagg.

Note: It is possible to have multiple BPF filters, one on a member
port and one on the lagg itself. That is why we have to have 2
checks and 2 ETHER_BPF_MTAPs.

Reviewed by: jhb, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D36136
2022-08-13 17:33:36 -04:00
Gleb Smirnoff
f277746e13 protosw: change prototype for pr_control
For some reason protosw.h is used during world complation and userland
is not aware of caddr_t, a relic from the first version of C.  Broken
buildworld is good reason to get rid of yet another caddr_t in kernel.

Fixes:	886fc1e804
2022-08-12 12:08:18 -07:00
Gleb Smirnoff
948f31d7b0 netinet: do not broadcast PRC_REDIRECT_HOST on ICMP redirect
This is expensive and useless call.  It has been useless since Alexander
melifaro@ moved the forwarding table to nexthops with passive invalidation.
What happens now is that cached route in a inpcb would get invalidated
on next ip_output().

These were the last users of pfctlinput(), so garbage collect it.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36156
2022-08-12 08:31:29 -07:00
Gleb Smirnoff
3d2041c035 raw ip: merge rip_output() into rip_send()
While here, address the unlocked 'dst' read.  Solve that by storing
a pointer either to the inpcb or to the sockaddr.  If we end up
copying address out of the inpcb, that would be done under the read
lock section.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36127
2022-08-11 09:19:37 -07:00
Gleb Smirnoff
8c77967ecc protosw: retire pr_output method
The only place to execute this method was raw_usend(). Only those
protocols that used raw socket were able to actually enter that method.
All pr_output assignments being deleted by this commit were a dead code
for many years.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36126
2022-08-11 09:19:37 -07:00
Gleb Smirnoff
b8103ca76d netinet: get interface event notifications directly via EVENTHANDLER(9)
The old mechanism of getting them via domains/protocols control input
is a relict from the previous century, when nothing like EVENTHANDLER(9)
existed yet.  Retire PRC_IFDOWN/PRC_IFUP as netinet was the only one
to use them.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36116
2022-08-11 09:19:36 -07:00
Gleb Smirnoff
07285bb4c2 tcp: utilize new solisten_clone() and solisten_enqueue()
This streamlines cloning of a socket from a listener.  Now we do not
drop the inpcb lock during creation of a new socket, do not do useless
state transitions, and put a fully initialized socket+inpcb+tcpcb into
the listen queue.

Before this change, first we would allocate the socket and inpcb+tcpcb via
tcp_usr_attach() as TCPS_CLOSED, link them into global list of pcbs, unlock
pcb and put this onto incomplete queue (see 6f3caa6d81).  Then, after
sonewconn() we would lock it again, transition into TCPS_SYN_RECEIVED,
insert into inpcb hash, finalize initialization of tcpcb.  And then, in
call into tcp_do_segment() and upon transition to TCPS_ESTABLISHED call
soisconnected().  This call would lock the listening socket once again
with a LOR protection sequence and then we would relocate the socket onto
the complete queue and only now it is ready for accept(2).

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D36064
2022-08-10 11:09:34 -07:00
Gleb Smirnoff
c7a62c925c inpcb: gather v4/v6 handling code into in_pcballoc() from protocols
Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D36062
2022-08-10 11:09:34 -07:00
Gleb Smirnoff
d88eb4654f tcp: address a wire level race with 2 ACKs at the end of TCP handshake
Imagine we are in SYN-RCVD state and two ACKs arrive at the same time,
both valid, e.g. coming from the same host and with valid sequence.

First packet would locate the listening socket in the inpcb database,
write-lock it and start expanding the syncache entry into a socket.
Meanwhile second packet would wait on the write lock of the listening
socket.  First packet will create a new ESTABLISHED socket, free the
syncache entry and unlock the listening socket.  Second packet would
call into syncache_expand(), but this time it will fail as there
is no syncache entry.  Second packet would generate RST, effectively
resetting the remote connection.

It seems to me, that it is impossible to solve this problem with
just rearranging locks, as the race happens at a wire level.

To solve the problem, for an ACK packet arrived on a listening socket,
that failed syncache lookup, perform a second non-wildcard lookup right
away.  That lookup may find the new born socket.  Otherwise, we indeed
send RST.

Tested by:		kp
Reviewed by:		tuexen, rrs
PR:			265154
Differential revision:	https://reviews.freebsd.org/D36066
2022-08-10 07:32:37 -07:00
Michael Tuexen
bd30a1216e tcp: improve BBLog for output events when using the FreeBSD stack
Put the return value of ip_output()/ip6_output in the output event
instead of adding another one in case of an error. This improves
consistency with other similar places.

Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D36085
2022-08-08 13:07:10 +02:00
Michael Tuexen
bb995f2ef0 sctp: improve handling of send() calls with no user data`
In particular, don't report EAGAIN on send() calls with no user
data, which might trigger a KASSERT in asyc IO.

Reported by:	syzbot+3b4dc5d1d63e9bd01eda@syzkaller.appspotmail.com
MFC after:	1 week
2022-08-08 12:53:42 +02:00
Gleb Smirnoff
e7231d07a4 tcp_input: update comment to match reality. 2022-08-07 11:18:30 -07:00
Michael Tuexen
979bc32c7c sctp: tweak panic message
MFC after:	1 week
2022-08-03 17:28:15 +02:00
Mike Karels
637f317c6d IPv6: fix problem with duplicate port assignment with v4-mapped addrs
In in_pcb_lport_dest(), if an IPv6 socket does not match any other IPv6
socket using in6_pcblookup_local(), and if the socket can also connect
to IPv4 (the INP_IPV4 vflag is set), check for IPv4 matches as well.
Otherwise, we can allocate a port that is used by an IPv4 socket
(possibly one created from IPv6 via the same procedure), and then
connect() can fail with EADDRINUSE, when it could have succeeded if
the bound port was not in use.

PR:		265064
Submitted by:	firk at cantconnect.ru (with modifications)
Reviewed by:	bz, melifaro
Differential Revision: https://reviews.freebsd.org/D36012
2022-08-02 09:49:46 -05:00
Michael Tuexen
1abc27dd52 tcp rack: simplify computation of rsm start and end
While there, also fix the setting of the SYN related flag.

Reviewed by:		rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D35862
2022-08-02 12:45:56 +02:00
Alexander V. Chernikov
ae6bfd12c8 routing: refactor private KPI
* Make nhgrp_get_nhops() return const struct weightened_nhop to
 indicate that the list is immutable
* Make nhgrp_get_group() return the actual group, instead of
 group+weight.

MFC after:	2 weeks
2022-08-01 10:02:12 +00:00
Alexander V. Chernikov
800c68469b routing: add nhop(9) kpi.
Differential Revision: https://reviews.freebsd.org/D35985
MFC after:	1 month
2022-08-01 08:52:26 +00:00
Dimitry Andric
24e13a49fa Adjust sctp_drain() definition to avoid clang 15 warning
With clang 15, the following -Werror warning is produced:

    sys/netinet/sctp_pcb.c:6946:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    sctp_drain()
              ^
               void

This is because sctp_drain() is declared with a (void) argument list,
but defined with an empty argument list. Make the definition match the
declaration.

MFC after:	3 days
2022-07-26 19:59:55 +02:00
Dimitry Andric
2057985649 Adjust sctp_init_sysctls() definition to avoid clang 15 warning
With clang 15, the following -Werror warning is produced:

    sys/netinet/sctp_sysctl.c:55:18: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    sctp_init_sysctls()
                     ^
                      void

This is because sctp_init_sysctls() is declared with a (void) argument
list, but defined with an empty argument list. Make the definition match
the declaration.

MFC after:	3 days
2022-07-25 22:08:35 +02:00
Dimitry Andric
5bfd8cf369 Fix unused variable warning in sctp_timer.c
With clang 15, the following -Werror warning is produced:

    sys/netinet/sctp_timer.c:510:6: error: variable 'recovery_cnt' set but not used [-Werror,-Wunused-but-set-variable]
            int recovery_cnt = 0;
                ^

The 'recovery_cnt' variable is only used when INVARIANTS is undefined.
Ensure it is only declared and set in that case.

MFC after:	3 days
2022-07-25 22:08:28 +02:00
Dimitry Andric
9057feddc4 Fix unused variable warning in sctp_output.c
With clang 15, the following -Werror warning is produced:

    sys/netinet/sctp_output.c:9367:33: error: variable 'cnt_thru' set but not used [-Werror,-Wunused-but-set-variable]
            int no_fragmentflg, bundle_at, cnt_thru;
                                           ^

The 'cnt_thru' variable was in sctp_output.c when it was first added,
but appears to have been a debugging aid that has never been used, so
remove it.

MFC after:	3 days
2022-07-25 21:50:51 +02:00
Dimitry Andric
05b3a4282c Fix unused variable warnings in sctp_indata.c
With clang 15, the following -Werror warnings are produced:

    sys/netinet/sctp_indata.c:3309:6: error: variable 'tot_retrans' set but not used [-Werror,-Wunused-but-set-variable]
            int tot_retrans = 0;
                ^
    sys/netinet/sctp_indata.c:3842:20: error: variable 'resend' set but not used [-Werror,-Wunused-but-set-variable]
            int inflight = 0, resend = 0, inbetween = 0, acked = 0, above = 0;
                              ^
    sys/netinet/sctp_indata.c:3842:47: error: variable 'acked' set but not used [-Werror,-Wunused-but-set-variable]
            int inflight = 0, resend = 0, inbetween = 0, acked = 0, above = 0;
                                                         ^
    sys/netinet/sctp_indata.c:3842:58: error: variable 'above' set but not used [-Werror,-Wunused-but-set-variable]
            int inflight = 0, resend = 0, inbetween = 0, acked = 0, above = 0;
                                                                    ^

The 'tot_retrans' variable was used in sctp_strike_gap_ack_chunks(), but
refactoring in 493d8e5a83 got rid of it. Remove the variable since it
no longer serves any purpose.

The 'resend', 'acked', and 'above' variables are only used when
INVARIANTS is undefined. Ensure they are only declared and set in that
case.

MFC after:	3 days
2022-07-25 21:16:05 +02:00
Mike Karels
fb8ef16bab IPv4: correct limit on loopback_prefix
Commit efe58855f3 allowed the net.inet.ip.loopback_prefix value
to be 32.  However, with a 32-bit mask, 127.0.0.1 is not included
in the reserved loopback range, which should not be allowed.
Change the max prefix length to 31.
2022-07-21 09:38:17 -05:00
Michael Tuexen
5b741298b1 tcp rack: fix switching to RACK when FIN has been sent
Fix the rack sendmap entry in case a FIN has been sent when the
stack is switched over to RACK.

Reported by:		syzbot+dd55e316428419e9354b@syzkaller.appspotmail.com
Reviewed by:		rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D35731
2022-07-19 20:28:25 +02:00
Richard Scheffenegger
66605ff791 tcp: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error.
If an error occurs while processing a TCP segment with some data and the FIN
flag, the back out of the sequence number advance does not take into account the
increase by 1 due to the FIN flag.

Reviewed By: jch, gnn, #transport, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D2970
2022-07-14 03:18:19 +02:00
Mike Karels
efe58855f3 IPv4: experimental changes to allow net 0/8, 240/4, part of 127/8
Combined changes to allow experimentation with net 0/8 (network 0),
240/4 (Experimental/"Class E"), and part of the loopback net 127/8
(all but 127.0/16).  All changes are disabled by default, and can be
enabled by the following sysctls:

    net.inet.ip.allow_net0=1
    net.inet.ip.allow_net240=1
    net.inet.ip.loopback_prefixlen=16

When enabled, the corresponding addresses can be used as normal
unicast IP addresses, both as endpoints and when forwarding.

Add descriptions of the new sysctls to inet.4.

Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in
various C files when in.h includes vnet.h.

The proposals motivating this experimentation can be found in

    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0
    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240
    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127

Reviewed by:	rgrimes, pauamma_gundo.com; previous versions melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D35741
2022-07-13 09:46:05 -05:00
Gleb Smirnoff
aeb6948d43 bbr: check proper flag for connection had been closed
An older version of D35663 slipped through final reviews.

Submitted by:	Peter Lei
Fixes:		74703901d8
2022-07-08 22:04:44 -07:00
Gleb Smirnoff
1b91978f63 tcp: remove a condition in tcp_usr_detach() that never happens
The comment from Robert Watson doubts that this condition ever happens.
Our analysis confirm that.  Also, we found that if you manage to create
such a connection with help of some other bug, then after the "second
case" code is executed, the kernel will panic in other part of the stack.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D35714
2022-07-06 21:09:45 -07:00
Mitchell Horne
258958b3c7 ddb: use _FLAGS command macros where appropriate
Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.

Reviewed by:	markj, jhb
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35582
2022-07-05 11:56:55 -03:00
Gleb Smirnoff
d8596171c5 sockets: use only soref()/sorele() as socket reference count
o Retire SS_FDREF as it is basically a debug flag on top of already
  existing soref()/sorele().
o Convert SS_PROTOREF into soref()/sorele().
o Change reference model for the listen queues, see below.
o Make sofree() private.  The correct KPI to use is only sorele().
o Make soabort() respect the model and sorele() instead of sofree().

Note on listening queues.  Until now the sockets on a queue had zero
reference count.  And the reference were given only upon accept(2).  The
assumption was that there is no way to see the queued socket from anywhere
except its head.  This is not true, since queued sockets already have pcbs,
which are linked at least into the global pcb lists.  With this change we
put the reference right in the sonewconn() and on accept(2) path we just
hand the existing reference to the file descriptor.

Differential revision:	https://reviews.freebsd.org/D35679
2022-07-04 12:40:51 -07:00
Gleb Smirnoff
74703901d8 tcp: use a TCP flag to check if connection has been close(2)d
The flag SS_NOFDREF is a private flag of the socket layer.  It also
is supposed to be read with SOCK_LOCK(), which we don't own here.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D35663
2022-07-04 12:40:51 -07:00
Gleb Smirnoff
ad3ad06477 blackhole(4): fix operator precedence
Fixes:	3ea9a7cf7b
2022-06-27 17:52:19 -07:00
Michael Tuexen
121ecca0d8 sctp: add KASSERTs to ensure correct handling of listeners
This was suggested by markj@.

MFC after:	3 days
2022-06-27 19:04:45 +02:00
Gleb Smirnoff
bafe71fd27 sctp: do not clobber listening socket with sockbuf operations
The problem was here since 779f106aa1, but a4fc41423f turned it
into a panic.

Reviewed by:	tuexen
Reported by:	syzcaller
2022-06-27 09:24:49 -07:00
Hans Petter Selasky
f5766992c0 tcp: Correctly compute the TCP goodput in bits per second by using SEQ_SUB().
TCP sequence number differences should be computed using SEQ_SUB().

Differential Revision:	https://reviews.freebsd.org/D35505
Reviewed by:	rscheff@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-06-23 21:10:39 +02:00
Claudio Jeker
97453e5e72 Unlock inp when handling TCP_MD5SIG socket options
Unlock the inp when hanlding TCP_MD5SIG socket options. tcp_ipsec_pcbctl
handles locking the inp when the option is being modified.

This was found by Claudio Jeker while working on the OpenBGPd port.

On 14 we get a panic when trying to call getsockopt, on 13.1 the process
locks up using 100% CPU.

Reviewed by:	rscheff (transport), tuexen
MFC after:	3 days
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D35532
2022-06-23 15:57:56 +01:00
Michael Tuexen
bf6c6162c7 tcp: fix TCPPCAP for kernels enabling VNET
Reviewed by:		rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D35503
2022-06-15 23:28:54 +02:00
Michael Tuexen
ee9ee699d6 sctp: remove book keeping not needed anymore
MFC after:	3 days
2022-06-08 23:30:52 +02:00
Michael Tuexen
ad6ae52d1c sctp: cleanup, no functional change
MFC after:	3 days
2022-06-08 22:35:14 +02:00
Richard Scheffenegger
57317c8971 tcp: exclude KASSERTS when rescue retransmissions are in play.
The KASSERT criteria needs to be checked against the
sendbuffer so_snd in a subsequent version.

Reviewed By:	tuexen, #transport
PR:		263445
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35431
2022-06-08 14:51:31 +02:00
Richard Scheffenegger
ce2525c810 tcp: remove goto and address another NULL deref in SACK
Missed another NULL dereference during KASSERTS after traversing
the scoreboard. While at it, scratch the goto by making the
traversal conditional, and remove duplicate checks using an
unconditional loop with all checks inside.

Reviewed By:	hselasky
PR:		263445
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35428
2022-06-08 09:18:32 +02:00
Richard Scheffenegger
231e0dd5d1 tcp: skip sackhole checks on NULL
Inadvertedly introduced NULL pointer dereference during
sackhole sanity check in D35387.

Reviewed By:	glebius
PR:		263445
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35423
2022-06-07 18:18:42 +02:00
Richard Scheffenegger
91d6afe6e2 tcp: Sanity check of SACK holes on retransmissions
Adding a few KASSERT() to validate sanity of sack holes, and
bail out if sack hole is inconsistent to avoid panicing non-invariant builds.

Reviewed By:	hselasky, glebius
PR:		263445
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D35387
2022-06-07 09:38:16 +02:00
Arseny Smalyuk
81cac3906e ipfw: add support radix tables and table lookup for MAC addresses
By analogy with IP address matching, add a way to use ipfw radix
tables for MAC matching. This is implemented using new ipfw table
with mac:radix type. Also there are src-mac and dst-mac lookup
commands added.

Usage example:
  ipfw table 1 create type mac
  ipfw table 1 add 11:22:33:44:55:66/48
  ipfw add skipto tablearg src-mac 'table(1)'
  ipfw add deny src-mac 'table(1, 100)'
  ipfw add deny lookup dst-mac 1

Note: sysctl net.link.ether.ipfw=1 should be set to enable ipfw
filtering on L2.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D35103
2022-06-04 19:12:29 +03:00
Gordon Bergling
32a01b2b86 rack: Fix a common typo in comments and a sysctl description
- s/multipler/multiplier/

MFC after:	3 days
2022-06-04 17:56:56 +02:00
Gordon Bergling
c93db89231 rack: Fix a typo in a source code comment
- s/enought/enough/

MFC after:	3 days
2022-06-04 15:32:59 +02:00
Gordon Bergling
bd9e23c0a9 rack: Fix a typo in a source code comment
- s/continous/continuous/

MFC after:	3 days
2022-06-04 13:27:29 +02:00