opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-06-04 06:15:33 -04:00

Author	SHA1	Message	Date
Richard Scheffenegger	c21b7b55be	tcp: finish SACK loss recovery on sudden lack of SACK blocks While a receiver should continue sending SACK blocks for the duration of a SACK loss recovery, if for some reason the TCP options no longer contain these SACK blocks, but we already started maintaining the Scoreboard, keep on handling incoming ACKs (without SACK) as belonging to the SACK recovery. Reported by: thj Reviewed by: tuexen, #transport MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36046	2022-08-31 14:49:47 +02:00
Gleb Smirnoff	c00605751e	tcp: remove a dead code leftover from T/TCP, that doesn't have any value today.	2022-08-29 19:30:12 -07:00
Gleb Smirnoff	d9f6ac882a	protosw: retire PRU_ flags and their char names For many years only TCP debugging used them, but relatively recently TCP DTrace probes also start to use them. Move their declarations into tcp_debug.h, but start including tcp_debug.h unconditionally, so that compilation with DTrace and without TCPDEBUG is possible.	2022-08-17 11:50:32 -07:00
Gleb Smirnoff	d88eb4654f	tcp: address a wire level race with 2 ACKs at the end of TCP handshake Imagine we are in SYN-RCVD state and two ACKs arrive at the same time, both valid, e.g. coming from the same host and with valid sequence. First packet would locate the listening socket in the inpcb database, write-lock it and start expanding the syncache entry into a socket. Meanwhile second packet would wait on the write lock of the listening socket. First packet will create a new ESTABLISHED socket, free the syncache entry and unlock the listening socket. Second packet would call into syncache_expand(), but this time it will fail as there is no syncache entry. Second packet would generate RST, effectively resetting the remote connection. It seems to me, that it is impossible to solve this problem with just rearranging locks, as the race happens at a wire level. To solve the problem, for an ACK packet arrived on a listening socket, that failed syncache lookup, perform a second non-wildcard lookup right away. That lookup may find the new born socket. Otherwise, we indeed send RST. Tested by: kp Reviewed by: tuexen, rrs PR: 265154 Differential revision: https://reviews.freebsd.org/D36066	2022-08-10 07:32:37 -07:00
Gleb Smirnoff	e7231d07a4	tcp_input: update comment to match reality.	2022-08-07 11:18:30 -07:00
Gleb Smirnoff	74703901d8	tcp: use a TCP flag to check if connection has been close(2)d The flag SS_NOFDREF is a private flag of the socket layer. It also is supposed to be read with SOCK_LOCK(), which we don't own here. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D35663	2022-07-04 12:40:51 -07:00
Gleb Smirnoff	ad3ad06477	blackhole(4): fix operator precedence Fixes: `3ea9a7cf7b`	2022-06-27 17:52:19 -07:00
Hans Petter Selasky	f5766992c0	tcp: Correctly compute the TCP goodput in bits per second by using SEQ_SUB(). TCP sequence number differences should be computed using SEQ_SUB(). Differential Revision: https://reviews.freebsd.org/D35505 Reviewed by: rscheff@ MFC after: 1 week Sponsored by: NVIDIA Networking	2022-06-23 21:10:39 +02:00
Gleb Smirnoff	4328318445	sockets: use socket buffer mutexes in struct socket directly Since `c67f3b8b78` the sockbuf mutexes belong to the containing socket, and socket buffers just point to it. In `74a68313b5` macros that access this mutex directly were added. Go over the core socket code and eliminate code that reaches the mutex by dereferencing the sockbuf compatibility pointer. This change requires a KPI change, as some functions were given the sockbuf pointer only without any hint if it is a receive or send buffer. This change doesn't cover the whole kernel, many protocols still use compatibility pointers internally. However, it allows operation of a protocol that doesn't use them. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35152	2022-05-12 13:22:12 -07:00
Richard Scheffenegger	f7220c486c	tcp: move ECN handling code to a common file Reduce the burden to maintain correct and extensible ECN related code across multiple stacks and codepaths. Formally no functional change. Incidentially this establishes correct ECN operation in one instance. Reviewed By: rrs, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34162	2022-02-05 15:04:42 +01:00
Richard Scheffenegger	7994ef3c39	Revert "tcp: move ECN handling code to a common file" This reverts commit `0c424c90ea`.	2022-02-05 01:07:51 +01:00
Richard Scheffenegger	0c424c90ea	tcp: move ECN handling code to a common file Reduce the burden to maintain correct and extensible ECN related code across multiple stacks and codepaths. Formally no functional change. Incidentially this establishes correct ECN operation in one instance. Reviewed By: rrs, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34162	2022-02-04 22:54:41 +01:00
Richard Scheffenegger	1ebf460758	tcp: Access all 12 TCP header flags via inline function In order to consistently provide access to all (including reserved) TCP header flag bits, use an accessor function tcp_get_flags and tcp_set_flags. Also expand any flag variable from uint8_t / char to uint16_t. Reviewed By: hselasky, tuexen, glebius, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34130	2022-02-03 16:21:58 +01:00
Richard Scheffenegger	4531b3450b	tcp: Tidying up the conditionals for unwinding a spurious RTO - Use the semantically correct TSTMP_xx macro when comparing timestamps. (No functional change) - check for bad retransmits only when TSopt is present in ACK (don't assume there will be a valid TSopt in the TCP options struct) - exclude tsecr == 0, since that most likely indicates an invalid ts echo return (tsecr) value. Reviewed By: tuexen, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34062	2022-01-27 18:59:55 +01:00
Richard Scheffenegger	68e623c3f0	tcp: Rewind erraneous RTO only while performing RTO retransmissions Under rare circumstances, a spurious retranmission is incorrectly detected and rewound, messing up various tcpcb values, which can lead to a panic when SACK is in use. Reviewed By: tuexen, chengc_netapp.com, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D33979	2022-01-27 18:49:42 +01:00
Gleb Smirnoff	40fa3e40b5	tcp: mechanically substitute call to tfb_tcp_output to new method. Made with sed(1) execution: sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/) sed: s/tp->t_fb->tfb_tcp_output$tp$/tcp_output(tp)/ s/to tfb_tcp_output/to tcp_output()/ Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33366	2021-12-26 08:47:59 -08:00
Gleb Smirnoff	75add59a8e	tcp: allocate statistics in the main tcp_init() No reason to have a separate SYSINIT.	2021-12-17 10:50:56 -08:00
Cy Schubert	db0ac6ded6	Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816" This reverts commit `266f97b5e9`, reversing changes made to `a10253cffe`. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.	2021-12-02 14:45:04 -08:00
Cy Schubert	266f97b5e9	wpa: Import wpa_supplicant/hostapd commit 14ab4a816 This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month	2021-12-02 13:35:14 -08:00
Gleb Smirnoff	de2d47842e	SMR protection for inpcbs With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector. Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details: - The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *). Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	3ea9a7cf7b	blackhole(4): disable for locally originated TCP/UDP packets In most cases blackholing for locally originated packets is undesired, leads to different kind of lags and delays. Provide sysctls to enforce it, e.g. for debugging purposes. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32718	2021-11-03 13:02:44 -07:00
Richard Scheffenegger	74d7fc8753	tcp: Add PRR cwnd reduction for non-SACK loss This completes PRR cwnd reduction in all circumstances for the base TCP stack (SACK loss recovery, ECN window reduction, non-SACK loss recovery), preventing the arriving ACKs to clock out new data at the old, too high rate. This reduces the chance to induce additional losses while recovering from loss (during congested network conditions). For non-SACK loss recovery, each ACK is assumed to have one MSS delivered. In order to prevent ACK-split attacks, only one window worth of ACKs is considered to actually have delivered new data. MFC after: 6 weeks Reviewed By: rrs, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29441	2021-06-19 19:25:22 +02:00
Mark Johnston	f4bb1869dd	Consistently use the SOLISTENING() macro Some code was using it already, but in many places we were testing SO_ACCEPTCONN directly. As a small step towards fixing some bugs involving synchronization with listen(2), make the kernel consistently use SOLISTENING(). No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-06-14 17:32:27 -04:00
Randall Stewart	4747500dea	tcp: A better fix for the previously attempted fix of the ack-war issue with tcp. So it turns out that my fix before was not correct. It ended with us failing some of the "improved" SYN tests, since we are not in the correct states. With more digging I have figured out the root of the problem is that when we receive a SYN\|FIN the reassembly code made it so we create a segq entry to hold the FIN. In the established state where we were not in order this would be correct i.e. a 0 len with a FIN would need to be accepted. But if you are in a front state we need to strip the FIN so we correctly handle the ACK but ignore the FIN. This gets us into the proper states and avoids the previous ack war. I back out some of the previous changes but then add a new change here in tcp_reass() that fixes the root cause of the issue. We still leave the rack panic fixes in place however. Reviewed by: mtuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30627	2021-06-04 05:26:43 -04:00
Randall Stewart	8c69d988a8	tcp: When we have an out-of-order FIN we do want to strip off the FIN bit. The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr). However there was a missing case, i.e. where we get an out-of-order FIN by itself. In such a case we don't want to leave the FIN bit set, otherwise we will do the wrong thing and ack the FIN incorrectly. Instead we need to go through the tcp_reasm() code and that way the FIN will be stripped and all will be well. Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30497	2021-05-27 10:50:32 -04:00
Randall Stewart	13c0e198ca	tcp: Fix bugs related to the PUSH bit and rack and an ack war Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed in the last commit. The problem is the left edge gets transmitted before the adjustments are done to the send_map, this means that right edge bits must be considered to be added only if the entire RSM is being retransmitted. Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself. After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically what happens is we go into the reassembly code and lose the FIN bit. The trick here is we should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything. That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no timers running. This is because the usrclosed function gets called and the FIN's and such have already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2 timer can speed this up in testing. Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30451	2021-05-25 13:23:31 -04:00
Richard Scheffenegger	032bf749fd	[tcp] Keep socket buffer locked until upcall r367492 would unlock the socket buffer before eventually calling the upcall. This leads to problematic interaction with NFS kernel server/client components (MP threads) accessing the socket buffer with potentially not correctly updated state. Reported by: rmacklem Reviewed By: tuexen, #transport Tested by: rmacklem, otis MFC after: 2 weeks Sponsored By: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29690	2021-05-21 11:07:51 +02:00
Richard Scheffenegger	0471a8c734	tcp: SACK Lost Retransmission Detection (LRD) Recover from excessive losses without reverting to a retransmission timeout (RTO). Disabled by default, enable with sysctl net.inet.tcp.do_lrd=1 Reviewed By: #transport, rrs, tuexen, #manpages Sponsored by: Netapp, Inc. Differential Revision: https://reviews.freebsd.org/D28931	2021-05-10 19:06:20 +02:00
Randall Stewart	5d8fd932e4	This brings into sync FreeBSD with the netflix versions of rack and bbr. This fixes several breakages (panics) since the tcp_lro code was committed that have been reported. Quite a few new features are now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the largest). There is also support for ack-war prevention. Documents comming soon on rack.. Sponsored by: Netflix Reviewed by: rscheff, mtuexen Differential Revision: https://reviews.freebsd.org/D30036	2021-05-06 11:22:26 -04:00
Richard Scheffenegger	48be5b976e	tcp: stop spurious rescue retransmissions and potential asserts Reported by: pho@ MFC after: 3 days Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29970	2021-04-28 15:01:10 +02:00
Gleb Smirnoff	1db08fbe3f	tcp_input: always request read-locking of PCB for any pure SYN segment. This is further rework of `08d9c92027`. Now we carry the knowledge of lock type all the way through tcp_input() and also into tcp_twcheck(). Ideally the rlocking for pure SYNs should propagate all the way into the alternative TCP stacks, but not yet today. This should close a race when socket is bind(2)-ed but not yet listen(2)-ed and a SYN-packet arrives racing with listen(2), discovered recently by pho@.	2021-04-20 10:02:20 -07:00
Gleb Smirnoff	7b5053ce22	tcp_input: remove comments and assertions about tcpbinfo locking They aren't valid since `d40c0d47cd`.	2021-04-20 10:02:20 -07:00
Michael Tuexen	9e644c2300	tcp: add support for TCP over UDP Adding support for TCP over UDP allows communication with TCP stacks which can be implemented in userspace without requiring special priviledges or specific support by the OS. This is joint work with rrs. Reviewed by: rrs Sponsored by: Netflix, Inc. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29469	2021-04-18 16:16:42 +02:00
Richard Scheffenegger	d1de2b05a0	tcp: Rename rfc6675_pipe to sack.revised, and enable by default As full support of RFC6675 is in place, deprecating net.inet.tcp.rfc6675_pipe and enabling by default net.inet.tcp.sack.revised. Reviewed By: #transport, kbowling, rrs Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28702	2021-04-17 14:59:45 +02:00
Gleb Smirnoff	8d5719aa74	syncache: simplify syncache_add() KPI to return struct socket pointer directly, not overwriting the listen socket pointer argument. Not a functional change.	2021-04-12 08:27:40 -07:00
Gleb Smirnoff	08d9c92027	tcp_input/syncache: acquire only read lock on PCB for SYN,!ACK packets When packet is a SYN packet, we don't need to modify any existing PCB. Normally SYN arrives on a listening socket, we either create a syncache entry or generate syncookie, but we don't modify anything with the listening socket or associated PCB. Thus create a new PCB lookup mode - rlock if listening. This removes the primary contention point under SYN flood - the listening socket PCB. Sidenote: when SYN arrives on a synchronized connection, we still don't need write access to PCB to send a challenge ACK or just to drop. There is only one exclusion - tcptw recycling. However, existing entanglement of tcp_input + stacks doesn't allow to make this change small. Consider this patch as first approach to the problem. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D29576	2021-04-12 08:25:31 -07:00
Richard Scheffenegger	90cca08e91	tcp: Prepare PRR to work with NewReno LossRecovery Add proper PRR vnet declarations for consistency. Also add pointer to tcpopt struct to tcp_do_prr_ack, in preparation for it to deal with non-SACK window reduction (after loss). No functional change. MFC after: 2 weeks Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29440	2021-04-08 19:16:31 +02:00
Richard Scheffenegger	b9f803b7d4	tcp: Use PRR for ECN congestion recovery MFC after: 2 weeks Reviewed By: #transport, rrs Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28972	2021-03-26 02:06:15 +01:00
Richard Scheffenegger	eb3a59a831	tcp: Refactor PRR code No functional change intended. MFC after: 2 weeks Reviewed By: #transport, rrs Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29411	2021-03-26 00:01:34 +01:00
Richard Scheffenegger	0533fab89e	tcp: Perform simple fast retransmit when SACK Blocks are missing on SACK session MFC after: 2 weeks Reviewed By: #transport, rrs Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28634	2021-03-25 23:23:48 +01:00
Michael Tuexen	40f41ece76	tcp: improve handling of SYN segments in SYN-SENT state Ensure that the stack does not generate a DSACK block for user data received on a SYN segment in SYN-SENT state. Reviewed by: rscheff MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D29376 Sponsored by: Netflix, Inc.	2021-03-22 15:58:49 +01:00
Richard Scheffenegger	e53138694a	tcp: Add prr_out in preparation for PRR/nonSACK and LRD Reviewed By: #transport, kbowling MFC after: 3 days Sponsored By: Netapp, Inc. Differential Revision: https://reviews.freebsd.org/D29058	2021-03-06 00:38:22 +01:00
Richard Scheffenegger	4a8f3aad37	tcp: remove incorrect reset of SACK variable in PRR Reviewed By: #transport, rrs, tuexen PR: 253848 MFC after: 3 days Sponsored By: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29083	2021-03-05 17:45:54 +01:00
Kristof Provost	bb4a7d94b9	net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros Introduce convenience macros to retrieve the DSCP, ECN or traffic class bits from an IPv6 header. Use them where appropriate. Reviewed by: ae (previous version), rscheff, tuexen, rgrimes MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29056	2021-03-04 20:56:48 +01:00
Richard Scheffenegger	0b0f8b359d	calculate prr_out correctly when pipe < ssthresh Reviewed By: #transport, tuexen MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28998	2021-03-01 16:26:05 +01:00
Richard Scheffenegger	e9071000c9	Improve PRR initial transmission timing Reviewed By: tuexen, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28953	2021-02-28 15:46:54 +01:00
Richard Scheffenegger	9e83a6a556	Include new data sent in PRR calculation Reviewed By: #transport, kbowling MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28941	2021-02-26 22:31:58 +01:00
Richard Scheffenegger	2593f858d7	A TCP server has to take into consideration, if TCP_NOOPT is preventing the negotiation of TCP features. This affects most TCP options but adherance to RFC7323 with the timestamp option will prevent a session from getting established. PR: 253576 Reviewed By: tuexen, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28652	2021-02-25 19:12:20 +01:00
Richard Scheffenegger	31d7a27c6e	PRR: Avoid accounting left-edge twice in partial ACK. Reviewed By: #transport, kbowling MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28819	2021-02-25 18:37:47 +01:00
Richard Scheffenegger	48396dc779	Address two incorrect calculations and enhance readability of PRR code - address second instance of cwnd potentially becoming zero - fix sublte bug due to implicit int to uint typecase in max() - fix bug due to typo in hand-coded CEILING() function by using howmany() macro - use int instead of long, and add a missing long typecast - replace if conditionals with easier to read imax/imin (as in pseudocode) Reviewed By: #transport, kbowling MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28813	2021-02-25 18:32:04 +01:00

1 2 3 4 5 ...

688 commits