This streamlines cloning of a socket from a listener. Now we do not
drop the inpcb lock during creation of a new socket, do not do useless
state transitions, and put a fully initialized socket+inpcb+tcpcb into
the listen queue.
Before this change, first we would allocate the socket and inpcb+tcpcb via
tcp_usr_attach() as TCPS_CLOSED, link them into global list of pcbs, unlock
pcb and put this onto incomplete queue (see 6f3caa6d81). Then, after
sonewconn() we would lock it again, transition into TCPS_SYN_RECEIVED,
insert into inpcb hash, finalize initialization of tcpcb. And then, in
call into tcp_do_segment() and upon transition to TCPS_ESTABLISHED call
soisconnected(). This call would lock the listening socket once again
with a LOR protection sequence and then we would relocate the socket onto
the complete queue and only now it is ready for accept(2).
Reviewed by: rrs, tuexen
Differential revision: https://reviews.freebsd.org/D36064
Imagine we are in SYN-RCVD state and two ACKs arrive at the same time,
both valid, e.g. coming from the same host and with valid sequence.
First packet would locate the listening socket in the inpcb database,
write-lock it and start expanding the syncache entry into a socket.
Meanwhile second packet would wait on the write lock of the listening
socket. First packet will create a new ESTABLISHED socket, free the
syncache entry and unlock the listening socket. Second packet would
call into syncache_expand(), but this time it will fail as there
is no syncache entry. Second packet would generate RST, effectively
resetting the remote connection.
It seems to me, that it is impossible to solve this problem with
just rearranging locks, as the race happens at a wire level.
To solve the problem, for an ACK packet arrived on a listening socket,
that failed syncache lookup, perform a second non-wildcard lookup right
away. That lookup may find the new born socket. Otherwise, we indeed
send RST.
Tested by: kp
Reviewed by: tuexen, rrs
PR: 265154
Differential revision: https://reviews.freebsd.org/D36066
Put the return value of ip_output()/ip6_output in the output event
instead of adding another one in case of an error. This improves
consistency with other similar places.
Reviewed by: rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D36085
In in_pcb_lport_dest(), if an IPv6 socket does not match any other IPv6
socket using in6_pcblookup_local(), and if the socket can also connect
to IPv4 (the INP_IPV4 vflag is set), check for IPv4 matches as well.
Otherwise, we can allocate a port that is used by an IPv4 socket
(possibly one created from IPv6 via the same procedure), and then
connect() can fail with EADDRINUSE, when it could have succeeded if
the bound port was not in use.
PR: 265064
Submitted by: firk at cantconnect.ru (with modifications)
Reviewed by: bz, melifaro
Differential Revision: https://reviews.freebsd.org/D36012
While there, also fix the setting of the SYN related flag.
Reviewed by: rrs
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D35862
* Make nhgrp_get_nhops() return const struct weightened_nhop to
indicate that the list is immutable
* Make nhgrp_get_group() return the actual group, instead of
group+weight.
MFC after: 2 weeks
With clang 15, the following -Werror warning is produced:
sys/netinet/sctp_pcb.c:6946:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
sctp_drain()
^
void
This is because sctp_drain() is declared with a (void) argument list,
but defined with an empty argument list. Make the definition match the
declaration.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/netinet/sctp_sysctl.c:55:18: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
sctp_init_sysctls()
^
void
This is because sctp_init_sysctls() is declared with a (void) argument
list, but defined with an empty argument list. Make the definition match
the declaration.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/netinet/sctp_timer.c:510:6: error: variable 'recovery_cnt' set but not used [-Werror,-Wunused-but-set-variable]
int recovery_cnt = 0;
^
The 'recovery_cnt' variable is only used when INVARIANTS is undefined.
Ensure it is only declared and set in that case.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/netinet/sctp_output.c:9367:33: error: variable 'cnt_thru' set but not used [-Werror,-Wunused-but-set-variable]
int no_fragmentflg, bundle_at, cnt_thru;
^
The 'cnt_thru' variable was in sctp_output.c when it was first added,
but appears to have been a debugging aid that has never been used, so
remove it.
MFC after: 3 days
With clang 15, the following -Werror warnings are produced:
sys/netinet/sctp_indata.c:3309:6: error: variable 'tot_retrans' set but not used [-Werror,-Wunused-but-set-variable]
int tot_retrans = 0;
^
sys/netinet/sctp_indata.c:3842:20: error: variable 'resend' set but not used [-Werror,-Wunused-but-set-variable]
int inflight = 0, resend = 0, inbetween = 0, acked = 0, above = 0;
^
sys/netinet/sctp_indata.c:3842:47: error: variable 'acked' set but not used [-Werror,-Wunused-but-set-variable]
int inflight = 0, resend = 0, inbetween = 0, acked = 0, above = 0;
^
sys/netinet/sctp_indata.c:3842:58: error: variable 'above' set but not used [-Werror,-Wunused-but-set-variable]
int inflight = 0, resend = 0, inbetween = 0, acked = 0, above = 0;
^
The 'tot_retrans' variable was used in sctp_strike_gap_ack_chunks(), but
refactoring in 493d8e5a83 got rid of it. Remove the variable since it
no longer serves any purpose.
The 'resend', 'acked', and 'above' variables are only used when
INVARIANTS is undefined. Ensure they are only declared and set in that
case.
MFC after: 3 days
Commit efe58855f3 allowed the net.inet.ip.loopback_prefix value
to be 32. However, with a 32-bit mask, 127.0.0.1 is not included
in the reserved loopback range, which should not be allowed.
Change the max prefix length to 31.
If an error occurs while processing a TCP segment with some data and the FIN
flag, the back out of the sequence number advance does not take into account the
increase by 1 due to the FIN flag.
Reviewed By: jch, gnn, #transport, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D2970
Combined changes to allow experimentation with net 0/8 (network 0),
240/4 (Experimental/"Class E"), and part of the loopback net 127/8
(all but 127.0/16). All changes are disabled by default, and can be
enabled by the following sysctls:
net.inet.ip.allow_net0=1
net.inet.ip.allow_net240=1
net.inet.ip.loopback_prefixlen=16
When enabled, the corresponding addresses can be used as normal
unicast IP addresses, both as endpoints and when forwarding.
Add descriptions of the new sysctls to inet.4.
Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in
various C files when in.h includes vnet.h.
The proposals motivating this experimentation can be found in
https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127
Reviewed by: rgrimes, pauamma_gundo.com; previous versions melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D35741
The comment from Robert Watson doubts that this condition ever happens.
Our analysis confirm that. Also, we found that if you manage to create
such a connection with help of some other bug, then after the "second
case" code is executed, the kernel will panic in other part of the stack.
Reviewed by: rrs, tuexen
Differential revision: https://reviews.freebsd.org/D35714
Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.
Reviewed by: markj, jhb
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35582
o Retire SS_FDREF as it is basically a debug flag on top of already
existing soref()/sorele().
o Convert SS_PROTOREF into soref()/sorele().
o Change reference model for the listen queues, see below.
o Make sofree() private. The correct KPI to use is only sorele().
o Make soabort() respect the model and sorele() instead of sofree().
Note on listening queues. Until now the sockets on a queue had zero
reference count. And the reference were given only upon accept(2). The
assumption was that there is no way to see the queued socket from anywhere
except its head. This is not true, since queued sockets already have pcbs,
which are linked at least into the global pcb lists. With this change we
put the reference right in the sonewconn() and on accept(2) path we just
hand the existing reference to the file descriptor.
Differential revision: https://reviews.freebsd.org/D35679
The flag SS_NOFDREF is a private flag of the socket layer. It also
is supposed to be read with SOCK_LOCK(), which we don't own here.
Reviewed by: rrs, tuexen
Differential revision: https://reviews.freebsd.org/D35663
Unlock the inp when hanlding TCP_MD5SIG socket options. tcp_ipsec_pcbctl
handles locking the inp when the option is being modified.
This was found by Claudio Jeker while working on the OpenBGPd port.
On 14 we get a panic when trying to call getsockopt, on 13.1 the process
locks up using 100% CPU.
Reviewed by: rscheff (transport), tuexen
MFC after: 3 days
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D35532
The KASSERT criteria needs to be checked against the
sendbuffer so_snd in a subsequent version.
Reviewed By: tuexen, #transport
PR: 263445
MFC after: 1 week
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35431
Missed another NULL dereference during KASSERTS after traversing
the scoreboard. While at it, scratch the goto by making the
traversal conditional, and remove duplicate checks using an
unconditional loop with all checks inside.
Reviewed By: hselasky
PR: 263445
MFC after: 1 week
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35428
Adding a few KASSERT() to validate sanity of sack holes, and
bail out if sack hole is inconsistent to avoid panicing non-invariant builds.
Reviewed By: hselasky, glebius
PR: 263445
MFC after: 1 week
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35387
By analogy with IP address matching, add a way to use ipfw radix
tables for MAC matching. This is implemented using new ipfw table
with mac:radix type. Also there are src-mac and dst-mac lookup
commands added.
Usage example:
ipfw table 1 create type mac
ipfw table 1 add 11:22:33:44:55:66/48
ipfw add skipto tablearg src-mac 'table(1)'
ipfw add deny src-mac 'table(1, 100)'
ipfw add deny lookup dst-mac 1
Note: sysctl net.link.ether.ipfw=1 should be set to enable ipfw
filtering on L2.
Reviewed by: melifaro
Obtained from: Yandex LLC
MFC after: 1 month
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D35103
When the TCP sequence number subtracted is greater than 2**32 minus
the window size, or 2**31 minus the window size, the use of unsigned
long as an intermediate variable, may result in an incorrect retransmit
length computation on all 64-bit platforms.
While at it create a helper macro to facilitate the computation of
the difference between two TCP sequence numbers.
Differential Revision: https://reviews.freebsd.org/D35388
Reviewed by: rscheff
MFC after: 3 days
Sponsored by: NVIDIA Networking
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.
Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.
Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35365
MFC after: 2 weeks
In order to decrease ifdef INET/INET6s in the lltable implementation,
introduce the llt_post_resolved callback and implement protocol-dependent
code in the protocol-dependent part.
Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35322
MFC after: 2 weeks