opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-20 21:59:20 -04:00

Author	SHA1	Message	Date
Michael Tuexen	0bfc52bea5	Use the correct length. The wrong one was too large. MFC after: 3 days	2015-11-06 22:08:05 +00:00
Michael Tuexen	179f731bb0	The field sinfo_timetolive should have been sinfo_pr_value. Thanks to Jens Hoelscher for making me aware of the bug. MFC after: 1 week	2015-11-06 14:00:26 +00:00
Michael Tuexen	b70b526d17	Fix typos in field names of struct sctp_extrcvinfo. Provide defines to allow applications to compile. Thanks to Jens Hoelscher for making me aware of the typos. MFC after: 1 week	2015-11-06 13:08:16 +00:00
Steven Hartland	ac19560a34	Add MTU support to carp interfaces MFC after: 2 weeks Sponsored by: Multiplay	2015-11-05 17:23:02 +00:00
George V. Neville-Neil	33872124a5	Replace the fastforward path with tryforward which does not require a sysctl and will always be on. The former split between default and fast forwarding is removed by this commit while preserving the ability to use all network stack features. Differential Revision: https://reviews.freebsd.org/D4042 Reviewed by: ae, melifaro, olivier, rwatson MFC after: 1 month Sponsored by: Rubicon Communications (Netgate)	2015-11-05 07:26:32 +00:00
Hiren Panchasara	054d38e38c	Improve the sysctl node name. X-MFC with: r290122 Sponsored by: Limelight Networks	2015-11-05 02:09:48 +00:00
Andrey V. Elsukov	5dc5a0e0aa	Implement `ipfw internal olist` command to list named objects. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-11-03 10:21:53 +00:00
George V. Neville-Neil	02b90dbf45	Set the proper direction to check for policies in this one case. Pointed out by: eri Sponsored by: Rubicon Communications (Netgate)	2015-10-29 21:26:32 +00:00
Hiren Panchasara	12eeb81fc1	Calculate the correct amount of bytes that are in-flight for a connection as suggested by RFC 6675. Currently differnt places in the stack tries to guess this in suboptimal ways. The main problem is that current calculations don't take sacked bytes into account. Sacked bytes are the bytes receiver acked via SACK option. This is suboptimal because it assumes that network has more outstanding (unacked) bytes than the actual value and thus sends less data by setting congestion window lower than what's possible which in turn may cause slower recovery from losses. As an example, one of the current calculations looks something like this: snd_nxt - snd_fack + sackhint.sack_bytes_rexmit New proposal from RFC 6675 is: snd_max - snd_una - sackhint.sacked_bytes + sackhint.sack_bytes_rexmit which takes sacked bytes into account which is a new addition to the sackhint struct. Only thing we are missing from RFC 6675 is isLost() i.e. segment being considered lost and thus adjusting pipe based on that which makes this calculation a bit on conservative side. The approach is very simple. We already process each ack with sack info in tcp_sack_doack() and extract sack blocks/holes out of it. We'd now also track this new variable sacked_bytes which keeps track of total sacked bytes reported. One downside to this approach is that we may get incorrect count of sacked_bytes if the other end decides to drop sack info in the ack because of memory pressure or some other reasons. But in this (not very likely) case also the pipe calculation would be conservative which is okay as opposed to being aggressive in sending packets into the network. Next step is to use this more accurate pipe estimation to drive congestion window adjustments. In collaboration with: rrs Reviewed by: jason_eggnet dot com, rrs MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D3971	2015-10-28 22:57:51 +00:00
Hiren Panchasara	356c7958a4	Add sysctl tunable net.inet.tcp.initcwnd_segments to specify initial congestion window in number of segments on fly. It is set to 10 segments by default. Remove net.inet.tcp.experimental.initcwnd10 which is now redundant. Also remove the parent node net.inet.tcp.experimental as it's not needed anymore and also because it was not well thought out. Differential Revision: https://reviews.freebsd.org/D3858 In collaboration with: lstewart Reviewed by: gnn (prev version), rwatson, allanjude, wblock (man page) MFC after: 2 weeks Relnotes: yes Sponsored by: Limelight Networks	2015-10-27 09:43:05 +00:00
George V. Neville-Neil	26882b4239	Turning on IPSEC used to introduce a slight amount of performance degradation (7%) for host host TCP connections over 10Gbps links, even when there were no secuirty policies in place. There is no change in performance on 1Gbps network links. Testing GENERIC vs. GENERIC-NOIPSEC vs. GENERIC with this change shows that the new code removes any overhead introduced by having IPSEC always in the kernel. Differential Revision: D3993 MFC after: 1 month Sponsored by: Rubicon Communications (Netgate)	2015-10-27 00:42:15 +00:00
Michael Tuexen	3db4ea954e	When processing a cookie, any mismatch in port numbers or the vtag results in failing the check. This fixes https://github.com/nplab/ETSI-SCTP-Conformance-Testsuite/blob/master/sctp-imh-tests/sctp-imh-i-3-3.pkt MFC after: 1 week	2015-10-26 21:19:49 +00:00
Michael Tuexen	6e9c45e0ee	Use __func__ instead of __FUNCTION__. This allows to compile the userland stack without errors using gcc5. Thanks to saghul for makeing me aware and providing the patch. MFC after: 1 week	2015-10-19 11:17:54 +00:00
Alexander V. Chernikov	26a6057525	Fix deletion of ifaddr lle entries when deleting prefix from interface in down state. Regression appeared in r287789, where the "prefix has no corresponding installed route" case was forgotten. Additionally, lltable_delete_addr() was called with incorrect byte order (default is network for lltable code). While here, improve comments on given cases and byte order. PR: 203573 Submitted by: phk	2015-10-18 12:26:25 +00:00
Alexander V. Chernikov	f221bcaa06	Remove several compat functions from pre-fib era.	2015-10-17 17:26:44 +00:00
Bjoern A. Zeeb	962d02b00b	Hopefully also unbreak VIMAGE kernels replacing the &V_... with &VNET_NAME(...). Everything else is just a whitespace wrapping change.	2015-10-15 01:44:32 +00:00
Bjoern A. Zeeb	f87ec781ef	Properly define functions withut argument and wrap for { for style purposes as followed in the rest of the file. This will hopefully make gcc more happy.	2015-10-14 18:30:04 +00:00
Hiren Panchasara	adf43a9279	Fix an unnecessarily aggressive behavior where mtu clamping begins on first retransmission timeout (rto) when blackhole detection is enabled. Make sure it only happens when the second attempt to send the same segment also fails with rto. Also make sure that each mtu probing stage (usually 1448 -> 1188 -> 524) follows the same pattern and gets 2 chances (rto) before further clamping down. Note: RFC4821 doesn't specify implementation details on how this situation should be handled. Differential Revision: https://reviews.freebsd.org/D3434 Reviewed by: sbruno, gnn (previous version) MFC after: 2 weeks Sponsored by: Limelight Networks	2015-10-14 06:57:28 +00:00
Hiren Panchasara	86a996e6bd	There are times when it would be really nice to have a record of the last few packets and/or state transitions from each TCP socket. That would help with narrowing down certain problems we see in the field that are hard to reproduce without understanding the history of how we got into a certain state. This change provides just that. It saves copies of the last N packets in a list in the tcpcb. When the tcpcb is destroyed, the list is freed. I thought this was likely to be more performance-friendly than saving copies of the tcpcb. Plus, with the packets, you should be able to reverse-engineer what happened to the tcpcb. To enable the feature, you will need to compile a kernel with the TCPPCAP option. Even then, the feature defaults to being deactivated. You can activate it by setting a positive value for the number of captured packets. You can do that on either a global basis or on a per-socket basis (via a setsockopt call). There is no way to get the packets out of the kernel other than using kmem or getting a coredump. I thought that would help some of the legal/privacy concerns regarding such a feature. However, it should be possible to add a future effort to export them in PCAP format. I tested this at low scale, and found that there were no mbuf leaks and the peak mbuf usage appeared to be unchanged with and without the feature. The main performance concern I can envision is the number of mbufs that would be used on systems with a large number of sockets. If you save five packets per direction per socket and have 3,000 sockets, that will consume at least 30,000 mbufs just to keep these packets. I tried to reduce the concerns associated with this by limiting the number of clusters (not mbufs) that could be used for this feature. Again, in my testing, that appears to work correctly. Differential Revision: D3100 Submitted by: Jonathan Looney <jlooney at juniper dot net> Reviewed by: gnn, hiren	2015-10-14 00:35:37 +00:00
Michael Tuexen	9372530827	Fix the timeout for INIT retransmissions in the case where RTO_MIN is smaller than RTO_INITIAL. MFC after: 1 week	2015-10-13 18:27:55 +00:00
Gleb Smirnoff	89bc042679	Fix regression from r287779, that bite me. If we call m_pullup() unconditionally, we end up with an mbuf chain of two mbufs, which later in in_arpreply() is rewritten from ARP request to ARP reply and is sent out. Looks like igb(4) (at least mine, and at least at my network) fails on such mbuf chain, so ARP reply doesn't go out wire. Thus, make the m_pullup() call conditional, as it is everywhere. Of course, the bug in igb(?) should be investigated, but better first fix the head. And unconditional m_pullup() was suboptimal, anyway.	2015-10-07 13:10:26 +00:00
Hiren Panchasara	62d4443f00	Add a comment specifying how we implement rfc3042. Differential Revision: D3746 MFC after: 1 week Sponsored by: Limelight Networks	2015-10-06 07:46:19 +00:00
Andrey V. Elsukov	f367798498	Take extra reference to security policy before calling crypto_dispatch(). Currently we perform crypto requests for IPSEC synchronous for most of crypto providers (software, aesni) and only VIA padlock calls crypto callback asynchronous. In synchronous mode it is possible, that security policy will be removed during the processing crypto request. And crypto callback will release the last reference to SP. Then upon return into ipsec[46]_process_packet() IPSECREQUEST_UNLOCK() will be called to already freed request. To prevent this we will take extra reference to SP. PR: 201876 Sponsored by: Yandex LLC	2015-09-30 08:16:33 +00:00
Gleb Smirnoff	794ac42374	When processing ICMP need frag message, ignore the suggested MTU unless it is smaller than the current one for this connection. This is behavior specified by RFC 1191, and this is how original BSD stack behaved, but this was unintentionally regressed in r182851. Reported & tested by: Richard Russo <russor whatsapp.com> Differential Revision: D3567 Sponsored by: Nginx, Inc.	2015-09-30 03:37:37 +00:00
Alexander V. Chernikov	1558cb2448	Eliminate nd6_nud_hint() and its TCP bindings. Initially function was introduced in r53541 (KAME initial commit) to "provide hints from upper layer protocols that indicate a connection is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability Confirmation). However, it was converted to do nothing (e.g. just return) in r122922 (tcp_hostcache implementation) back in 2003. Some defines were moved to tcp_var.h in r169541. Then, it was broken (for non-corner cases) by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So, right now this code is broken and has no "real" base users. Differential Revision: https://reviews.freebsd.org/D3699	2015-09-27 05:29:34 +00:00
Alexander V. Chernikov	4a336ef40c	rtsock requests for deleting interface address lles started to return EPERM instead of old "ignore-and-return 0" in r287789. This broke arp -da / ndp -cn behavior (they exit on rtsock command failure). Fix this by translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and making arp/ndp ignore these entries in batched delete. MFC after: 2 weeks	2015-09-27 04:54:29 +00:00
Alexander V. Chernikov	8e5aadb617	Replace toe_nd6_resolve() with nd6_resolve(). Reviewed by: np	2015-09-22 19:05:44 +00:00
Alexander V. Chernikov	aa5f023eaf	Unify nd6 state switching by using newly-created nd6_llinfo_setstate() function. The change is mostly mechanical with the following exception: Last piece of nd6_resolve_slow() was refactored: ND6_LLINFO_PERMANENT condition was removed as always-true, explicit ND6_LLINFO_NOSTATE -> ND6_LLINFO_INCOMPLETE state transition was removed as duplicate. Reviewed by: ae Sponsored by: Yandex LLC	2015-09-21 11:19:53 +00:00
Gleb Smirnoff	399fbd0ec0	Use proper byteswap macro. This isn't a functional change.	2015-09-17 17:27:49 +00:00
Gleb Smirnoff	db642c8e6e	In tcp_ctlinput() separate the (ip == NULL) block from the rest of the function to reduce so many levels of indentation. Style the lines that got now indentation reduced. No functional change. Checked with: md5	2015-09-16 21:42:33 +00:00
Alexander V. Chernikov	59c180c35c	Unify loopback route switching: * prepare gateway before insertion * use RTM_CHANGE instead of explicit find/change route * Remove fib argument from ifa_switch_loopback_route added in r264887: if old ifp fib differes from new one, that the caller is doing something wrong * Make ifa_*_loopback_route call single ifa_maintain_loopback_route().	2015-09-16 06:23:15 +00:00
Brad Davis	e5fe11011a	Remove redundant 'man page' Reviewed by: allanjude	2015-09-15 21:16:45 +00:00
Hiren Panchasara	550e9d4235	Remove unnecessary tcp state transition call. Differential Revision: D3451 Reviewed by: markj MFC after: 2 weeks Sponsored by: Limelight Networks	2015-09-15 20:04:30 +00:00
Alexander V. Chernikov	eec33ea052	* Improve logging invalid arp messages * Remove redundant check in ip_arpinput Suggested by: glebius MFC after: 2 weeks	2015-09-15 08:50:44 +00:00
Alexander V. Chernikov	d3cdb71655	* Require explicitl lle unlink prior to calling llentry_delete(). This one slightly decreases time of holding afdata wlock. * While here, make nd6_free() return void. No one has used its return value since r186119.	2015-09-15 06:48:19 +00:00
Alexander V. Chernikov	3e7a2321e3	* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573	2015-09-14 16:48:19 +00:00
Alexander V. Chernikov	deb6bda6e3	* Improve error checking for arp messages. * Clean stale headers from if_ether.c. Reported by: rozhuk.im at gmail.com Reviewed by: ae MFC after: 2 weeks	2015-09-14 10:28:47 +00:00
Hans Petter Selasky	d76d40126e	Update TSO limits to include all headers. To make driver programming easier the TSO limits are changed to reflect the values used in the BUSDMA tag a network adapter driver is using. The TCP/IP network stack will subtract space for all linklevel and protocol level headers and ensure that the full mbuf chain passed to the network adapter fits within the given limits. Implementation notes: If a network adapter driver needs to fixup the first mbuf in order to support VLAN tag insertion, the size of the VLAN tag should be subtracted from the TSO limit. Else not. Network adapters which typically inline the complete header mbuf could technically transmit one more segment. This patch does not implement a mechanism to recover the last segment for data transmission. It is believed when sufficiently large mbuf clusters are used, the segment limit will not be reached and recovering the last segment will not have any effect. The current TSO algorithm tries to send MTU-sized packets, where the MTU typically is 1500 bytes, which gives 1448 bytes of TCP data payload per packet for IPv4. That means if the TSO length limitiation is set to 65536 bytes, there will be a data payload remainder of (65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to recover total TSO length due to inlining mbuf header data will not have any effect, because adding or removing the ETH/IP/TCP headers to or from 324 bytes will not cause more or less TCP payload to be TSO'ed. Existing network adapter limits will be updated separately. Differential Revision: https://reviews.freebsd.org/D3458 Reviewed by: rmacklem MFC after: 2 weeks	2015-09-14 08:36:22 +00:00
George V. Neville-Neil	5d06879adb	dd DTrace probe points, translators and a corresponding script to provide the TCPDEBUG functionality with pure DTrace. Reviewed by: rwatson MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: D3530	2015-09-13 15:50:55 +00:00
Michael Tuexen	30811e70d9	Fix compilation issue introduced in r287717. Thanks to bz@ for making me aware of it. MFC after: 1 week	2015-09-12 21:23:24 +00:00
Michael Tuexen	6802b0904f	Address a compile warning. MFC after: 1 week	2015-09-12 18:00:06 +00:00
Michael Tuexen	86eda749af	Cleanup the handling of error causes for ERROR chunks. This fixes an inconsistency of the padding handling. The final padding is now considered to be a chunk padding. MFC after: 1 week	2015-09-12 17:08:51 +00:00
Michael Tuexen	e629b9fc56	Ensure that ERROR chunks are always padded by implementing this in the routine, which queues an ERROR chunk, instead on relyinh on the callers to do so. Since one caller missed this, this actially fixes a bug. MFC after: 1 week	2015-09-11 13:54:33 +00:00
Michael Tuexen	0941640f34	RFC 4960 requires that packets containing an INIT chunk bundled with another chunk are silently discarded. Do so, instead of sending an ABORT. MFC after: 1 week	2015-09-07 14:00:38 +00:00
Allan Jude	32d321fa4a	missed file that should have been included in r287528 PR: 184110 Submitted by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> Approved by: wblock (mentor)	2015-09-07 02:00:05 +00:00
Adrian Chadd	499baf0aa7	Replace rss_m2cpuid with rss_soft_m2cpuid_v4 for ip_direct_nh.nh_m2cpuid, because the RSS hash may need to be recalculated. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3564	2015-09-06 20:20:48 +00:00
Alexander V. Chernikov	26deb8826c	Do not pass lle to nd6_ns_output(). Use newly-added nd6_llinfo_get_holdsrc() to extract desired IPv6 source from holdchain and pass it to the nd6_ns_output().	2015-09-05 14:14:03 +00:00
Gleb Smirnoff	388909a12a	Use Jenkins hash for TCP syncache. o Unlike xor, in Jenkins hash every bit of input affects virtually every bit of output, thus salting the hash actually works. With xor salting only provides a false sense of security, since if hash(x) collides with hash(y), then of course, hash(x) ^ salt would also collide with hash(y) ^ salt. [1] o Jenkins provides much better distribution than xor, very close to ideal. TCP connection setup/teardown benchmark has shown a 10% increase with default hash size, and with bigger hashes that still provide possibility for collisions. With enormous hash size, when dataset is by an order of magnitude smaller than hash size, the benchmark has shown 4% decrease in performance decrease, which is expected and acceptable. Noticed by: Jeffrey Knockel <jeffk cs.unm.edu> [1] Benchmarks by: jch Reviewed by: jch, pkelsey, delphij Security: strengthens protection against hash collision DoS Sponsored by: Nginx, Inc.	2015-09-05 10:15:19 +00:00
Gleb Smirnoff	24067db8ca	Make tcp_mtudisc() static and void. No functional changes. Sponsored by: Nginx, Inc.	2015-09-04 12:02:12 +00:00
Michael Tuexen	6fb9db98b3	Don't leak memory in an error case. MFC after: 1 week	2015-09-04 09:24:07 +00:00
Michael Tuexen	59713bbf27	Add a NULL pointer check to silence the clang code analyzer. MFC after: 1 week	2015-09-04 09:22:16 +00:00
Michael Tuexen	aa1cfca969	Fix a bug where two SHUTDOWN_ACK chunks were sent if a SHUTDOWN chunk was received acking all outstanding data.	2015-09-03 22:15:56 +00:00
Julien Charbon	d6de19ac2f	Put r284245 back in place: If at first this fix was seen as a temporary workaround for a callout(9) issue, it turns out it is instead the right way to use callout in mpsafe mode without using callout_drain(). r284245 commit message: Fix a callout race condition introduced in TCP timers callouts with r281599. In TCP timer context, it is not enough to check callout_stop() return value to decide if a callout is still running or not, previous callout_reset() return values have also to be checked. Differential Revision: https://reviews.freebsd.org/D2763	2015-08-30 13:44:39 +00:00
Michael Tuexen	2e2d67945a	Use 5 times RTO.Max as the default for the shutdown guard timer as required by RFC 4960. The sysctl variable can be used to overwrite this. Discussed with: rrs MFC after: 1 week	2015-08-29 17:26:29 +00:00
Michael Tuexen	e92c2a8d6a	Fix the exporting of SCTP association states to userland. Without this, associations in SHUTDOWN-PENDING were never reported correctly. MFC after: 3 weeks	2015-08-29 09:14:32 +00:00
Adrian Chadd	2527ccad2d	Rename rss_soft_m2cpuid() -> rss_soft_m2cpuid_v4() in preparation for an IPv6 version to show up. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3504	2015-08-29 06:58:30 +00:00
Adrian Chadd	e5562eb934	Replace the printf()s with optional rate limited debugging for RSS. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3471	2015-08-28 05:58:16 +00:00
Bjoern A. Zeeb	a86e5c96af	get_inpcbinfo() and get_pcblist() are UDP local functions and do not do what one would expect by name. Prefix them with "udp_" to at least obviously limit the scope. This is a non-functional change. Reviewed by: gnn, rwatson MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3505	2015-08-27 15:27:41 +00:00
Julien Charbon	bcf9b91395	Revert r284245: "Fix a callout race condition introduced in TCP timers callouts with r281599." r281599 fixed a TCP timer race condition, but due a callout(9) bug it also introduced another race condition workaround-ed with r284245. The callout(9) bug being fixed with r286880, we can now revert the workaround (r284245). Differential Revision: https://reviews.freebsd.org/D2079 (Initial change) Differential Revision: https://reviews.freebsd.org/D2763 (Workaround) Differential Revision: https://reviews.freebsd.org/D3078 (Fix) Sponsored by: Verisign, Inc. MFC after: 2 weeks	2015-08-24 09:30:27 +00:00
Alexander V. Chernikov	5a2555160f	* Split allocation and table linking for lle's. Before that, the logic besides lle_create() was the following: return existing if found, create if not. This behaviour was error-prone since we had to deal with 'sudden' static<>dynamic lle changes. This commit fixes bunch of different issues like: - refcount leak when lle is converted to static. Simple check case: console 1: while true; do for i in `arp -an\|awk '$4~/incomp/{print$2}'\|tr -d '()'`; do arp -s $i 00:22:44:66:88:00 ; arp -d $i; done; done console 2: ping -f any-dead-host-in-L2 console 3: # watch for memory consumption: vmstat -m \| awk '$1~/lltable/{print$2}' - possible problems in arptimer() / nd6_timer() when dropping/reacquiring lock. New logic explicitly handles use-or-create cases in every lla_create user. Basically, most of the changes are purely mechanical. However, we explicitly avoid using existing lle's for interface/static LLE records. * While here, call lle_event handlers on all real table lle change. * Create lltable_free_entry() calling existing per-lltable lle_free_t callback for entry deletion	2015-08-20 12:05:17 +00:00
Alexander V. Chernikov	a4141c63c5	Check value return from lle_create() for NULL. This bug sneaked unnoticed in r286722. Reported by: adrian	2015-08-19 21:08:42 +00:00
Julien Charbon	31a7749d4b	Make clear that TIME_WAIT timeout expiration is managed solely by tcp_tw_2msl_scan(). Sponsored by: Verisign, Inc.	2015-08-18 08:27:26 +00:00
Alexander V. Chernikov	0c4210f984	Fix panic when handling non-inet arp message introduced in r286825. Submitted by: delphij	2015-08-18 06:16:19 +00:00
Alexander V. Chernikov	512e30ef9f	Split arpresolve() into fast/slow path. This change isolates the most common case (e.g. successful lookup) from more complicates scenarios. It also (tries to) make code more simple by avoiding retry: cycle. The actual goal is to prepare code to the upcoming change that will allow LL address retrieval without acquiring LLE lock at all. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D3383	2015-08-16 12:23:58 +00:00
Michael Tuexen	faadc1b492	Allow the path MTU to grow up to the outgoing interface MTU. MFC after: 3 days	2015-08-14 14:26:13 +00:00
Alexander V. Chernikov	f3bfa7d1cf	Move lle update code from from gigantic ip_arpinput() to separate bunch of functions. The goal is to isolate actual lle updates to permit more fine-grained locking. Do all lle link-level update under AFDATA wlock. Sponsored by: Yandex LLC	2015-08-13 13:38:09 +00:00
Hiren Panchasara	ad389a8c3b	Remove unused TCPTV_SRTTDFLT. We initialize srtt with TCPTV_SRTTBASE when we don't have any rtt estimate. Differential Revision: D3334 Sponsored by: Limelight Networks	2015-08-12 16:08:37 +00:00
Alexander V. Chernikov	0447c1367a	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
Alexander V. Chernikov	314294de5c	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
Alexander V. Chernikov	41cb42a633	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
Alexander V. Chernikov	2caee4be35	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
Alexander V. Chernikov	11cdad9873	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
Kristof Provost	30edc5385e	tcp_reass_zone is not a VNET variable. This fixes a panic during 'sysctl -a' on VIMAGE kernels. The tcp_reass_zone variable is not VNET_DEFINE() so we can not mark it as a VNET variable (with CTLFLAG_VNET).	2015-08-09 19:07:24 +00:00
Marius Strobl	d2b5ade3f4	Fix compilation after r286458.	2015-08-08 21:42:15 +00:00
Marius Strobl	6e4cd74673	Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT.	2015-08-08 21:41:59 +00:00
Alexander V. Chernikov	4bdf0b6a9a	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2015-08-08 18:14:59 +00:00
Alexander V. Chernikov	e362cf0e9f	MFP r274553: * Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete lla_create now returns with LLE_EXCLUSIVE lock for lle. * Provide typedefs for new/existing lltable callbacks. Reviewed by: ae	2015-08-08 17:48:54 +00:00
Alexander V. Chernikov	331dff0737	Simplify ip[6] simploop: Do not pass 'dst' sockaddr to ip[6]_mloopback: - We have explicit check for AF_INET in ip_output() - We assume ip header inside passed mbuf in ip_mloopback - We assume ip6 header inside passed mbuf in ip6_mloopback	2015-08-08 15:58:35 +00:00
Julien Charbon	079672cb07	Fix a kernel assertion issue introduced with r286227: Avoid too strict INP_INFO_RLOCK_ASSERT checks due to tcp_notify() being called from in6_pcbnotify(). Reported by: Larry Rosenman <ler@lerctr.org> Submitted by: markj, jch	2015-08-08 08:40:36 +00:00
Mark Johnston	8f980c016b	The mbuf parameter to ip_output_pfil() must be an output parameter since pfil(9) hooks may modify the chain. X-MFC-With: r286028	2015-08-03 17:47:02 +00:00
Julien Charbon	ff9b006d61	Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc.	2015-08-03 12:13:54 +00:00
Michael Tuexen	e7e71dd7f3	Don't take the port numbers for packets containing ABORT chunks from a freed mbuf. Just use them from the stcb. MFC after: 3 days	2015-08-02 16:07:30 +00:00
Andrey V. Elsukov	cf14ccb0f7	Remove unneded #include "opt_inet.h".	2015-07-31 09:02:28 +00:00
Hiren Panchasara	03041aaac8	Update snd_una description to make it more readable. Differential Revision: https://reviews.freebsd.org/D3179 Reviewed by: gnn Sponsored by: Limelight Networks	2015-07-30 19:24:49 +00:00
Ermal Luçi	3c40232395	Avoid double reference decrement when firewalls force relooping of packets When firewalls force a reloop of packets and the caller supplied a route the reference to the route might be reduced twice creating issues. This is especially the scenario when a packet is looped because of operation in the firewall but the new route lookup gives a down route. Differential Revision: https://reviews.freebsd.org/D3037 Reviewed by: gnn Approved by: gnn(mentor)	2015-07-29 20:10:36 +00:00
Ermal Luçi	d9f2a78249	ip_output normalization and fixes ip_output has a big chunk of code used to handle special cases with pfil consumers which also forces a reloop on it. Gather all this code together to make it readable and properly handle the reloop cases. Some of the issues identified: M_IP_NEXTHOP is not handled properly in existing code. route reference leaking is possible with in FIB number change route flags checking is not consistent in the function Differential Revision: https://reviews.freebsd.org/D3022 Reviewed by: gnn Approved by: gnn(mentor) MFC after: 4 weeks	2015-07-29 18:04:01 +00:00
Patrick Kelsey	4741bfcb57	Revert r265338, r271089 and r271123 as those changes do not handle non-inline urgent data and introduce an mbuf exhaustion attack vector similar to FreeBSD-SA-15:15.tcp, but not requiring VNETs. Address the issue described in FreeBSD-SA-15:15.tcp. Reviewed by: glebius Approved by: so Approved by: jmallett (mentor) Security: FreeBSD-SA-15:15.tcp Sponsored by: Norse Corp, Inc.	2015-07-29 17:59:13 +00:00
Andrey V. Elsukov	10a0e0bf0a	Eliminate the use of m_copydata() in gif_encapcheck(). ip_encap already has inspected mbuf's data, at least an IP header. And it is safe to use mtod() and do direct access to needed fields. Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that mbuf has a packet header. Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also remove "martian filters" checks. According to RFC 4213 it is enough to verify that the source address is the address of the encapsulator, as configured on the decapsulator. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-07-29 14:07:43 +00:00
Andrey V. Elsukov	cc0a3c8ca4	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149	2015-07-29 08:12:05 +00:00
Michael Tuexen	9ae56375af	Fix a typo reported by Erik Cederstrand. MFC after: 1 week	2015-07-28 08:50:13 +00:00
Michael Tuexen	267dbe63a1	Provide consistent error causes whenever an ABORT chunk is sent. MFC after: 1 week	2015-07-27 22:35:54 +00:00
Michael Tuexen	cf9e47b2f0	Improve locking on Mac OS X. This does not change the functionality on FreeBSD. Reviewed by: rrs MFC after: 1 week	2015-07-26 10:37:40 +00:00
Michael Tuexen	6247db3541	Fix and improve a debug message. The SID was reported as an SSN. MFC after: 1 week	2015-07-26 10:17:17 +00:00
Michael Tuexen	4ff815b71c	Move including netinet/icmp6.h around to avoid a problem when including netinet/icmp6.h and net/netmap.h. Both use ni_flags... This allows to build multistack with SCTP support. MFC after: 1 week	2015-07-25 18:26:09 +00:00
Kristof Provost	fc4443a1d5	Remove stale comment. The IPv6 pseudo header checksum was added by bz in r235961. Sponsored by: Essen FreeBSD Hackathon	2015-07-25 16:14:55 +00:00
Randall Stewart	5f98acb594	Fix silly syntax error emacs chugged in for me.. gesh. MFC after: 3 weeks	2015-07-24 14:13:43 +00:00
Randall Stewart	c616859963	Fix an issue with MAC OS locking and also optimize the case where we are sending back a stream-reset and a sack timer is running, in that case we should just send the SACK. MFC after: 3 weeks	2015-07-24 14:09:03 +00:00
Randall Stewart	7cca17758c	Fix several problems with Stream Reset. 1) We were not handling (or sending) the IN_PROGRESS case if the other side (or our side) was not able to reset (awaiting more data). 2) We would improperly send a stream-reset when we should not. Not waiting until the TSN had been assigned when data was inqueue. Reviewed by: tuexen	2015-07-22 11:30:37 +00:00
Xin LI	47a8e86509	Fix resource exhaustion due to sessions stuck in LAST_ACK state. Submitted by: Jonathan Looney (Juniper SIRT) Reviewed by: lstewart Security: CVE-2015-5358 Security: SA-15:13.tcp	2015-07-21 23:42:15 +00:00
Ermal Luçi	705f4d9c6a	IPSEC, remove variable argument function its already due. Differential Revision: https://reviews.freebsd.org/D3080 Reviewed by: gnn, ae Approved by: gnn(mentor)	2015-07-21 21:46:24 +00:00

1 2 3 4 5 ...

5377 commits