Commit graph

5107 commits

Author SHA1 Message Date
Hans Petter Selasky
c4c4346f5f Extend fixes made in r278103 and r38754 by copying the complete packet
header and not only partial flags and fields. Firewalls can attach
classification tags to the outgoing mbufs which should be copied to
all the new fragments. Else only the first fragment will be let
through by the firewall. This can easily be tested by sending a large
ping packet through a firewall. It was also discovered that VLAN
related flags and fields should be copied for packets traversing
through VLANs. This is all handled by "m_dup_pkthdr()".

Regarding the MAC policy check in ip_fragment(), the tag provided by
the originating mbuf is copied instead of using the default one
provided by m_gethdr().

Tested by:		Karim Fodil-Lemelin <fodillemlinkarim at gmail.com>
MFC after:		2 weeks
Sponsored by:		Mellanox Technologies
PR:			7802
2015-04-02 15:47:37 +00:00
Julien Charbon
033749179f Provide better debugging information in tcp_timer_activate() and
tcp_timer_active()

Differential Revision:	https://reviews.freebsd.org/D2179
Suggested by:		bz
Reviewed by:		jhb
Approved by:		jhb
2015-04-02 14:43:07 +00:00
Gleb Smirnoff
7a742e3744 Provide a comment explaining issues with the counter(9) trick, so that
people won't copy and paste it blindly.

Prodded by:	ian
Sponsored by:	Nginx, Inc.
2015-04-02 14:22:59 +00:00
Bjoern A. Zeeb
1d549750c9 Try to unbreak the build after r280971 by providing the missing
#include header for SYSINIT.
2015-04-02 00:30:53 +00:00
Gleb Smirnoff
6d947416cc o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
Julien Charbon
18832f1fd1 Use appropriate timeout_t* instead of void* in tcp_timer_activate()
Suggested by:		imp
Differential Revision:	https://reviews.freebsd.org/D2154
Reviewed by:		imp, jhb
Approved by:		jhb
2015-03-31 10:17:13 +00:00
Gleb Smirnoff
513635bfaa VNETalize random IP ID engine.
Sponsored by:	Nginx, Inc.
2015-03-28 16:59:57 +00:00
Gleb Smirnoff
1f08c9479f Initialize random IP ID engine via SYSINIT() instead of doing that on
first packet.  This allow to use M_WAITOK and cut down some error handling.

Sponsored by:	Nginx, Inc.
2015-03-28 16:06:46 +00:00
Fabien Thomas
d612b95e23 On multi CPU systems, we may emit successive packets with the same id.
Fix the race by using an atomic operation.

Differential Revision:	https://reviews.freebsd.org/D2141
Obtained from:	emeric.poupon@stormshield.eu
MFC after:	1 week
Sponsored by:	Stormshield
2015-03-27 13:26:59 +00:00
Michael Tuexen
d59909c3e2 Improve the selection of the destination address of SACK chunks.
This fixes
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196755
and is joint work with rrs@.

MFC after: 1 week
2015-03-26 22:05:31 +00:00
Michael Tuexen
a756ffc931 Make sure that we don't free an SCTP shared key too early.
Thanks to Pouyan Sepehrdad from Qualcomm Product Security Initiative
for reporting the issue.
MFC after: 3 days
2015-03-25 22:45:54 +00:00
Michael Tuexen
d9bdc5200a Use the reference count of the right SCTP inp.
Joint work with rrs@

MFC after: 3 days
2015-03-25 21:41:20 +00:00
Michael Tuexen
0426123f75 Fix two bugs which resulted in a screwed up end point list:
* Use a save way to walk throught a list while manipulting it.
* Have to appropiate locks in place.
Joint work with rrs@

MFC after: 3 days
2015-03-24 21:12:45 +00:00
Lawrence Stewart
efca16682d The addition of flowid and flowtype in r280233 and r280237 respectively forgot
to extend the IPv6 packet node format string, which causes a build failure when
SIFTR is compiled with IPv6 support.

Reported by:	Lars Eggert
2015-03-24 15:08:43 +00:00
Michael Tuexen
8427b3fd4f Fix the bug in the handling of fragmented abandoned SCTP user messages reported in
https://code.google.com/p/sctp-refimpl/issues/detail?id=11
Thanks to Lally Singh for reporting it.
MFC after: 3 days
2015-03-24 15:05:36 +00:00
Michael Tuexen
7fd5b4365a Fix an accounting bug related to the per stream chunk counter.
While there, don't refer to a net articifically.

MFC after: 3 days
2015-03-24 14:51:46 +00:00
Michael Tuexen
ca0f81984a When an ICMP message is received and the MTU shrinks, only
mark outstanding chunks for retransmissions.

MFC after: 3 days
2015-03-23 23:34:21 +00:00
Michael Tuexen
d5ec585697 Remove a useless assignment.
MFC after: 1 week
2015-03-23 15:12:02 +00:00
Hiren Panchasara
d0a8b2a5ae Add connection flow type to siftr(4).
Suggested by:	adrian
Sponsored by:	Limelight Networks
2015-03-19 00:23:16 +00:00
Hiren Panchasara
a025fd1487 Add connection flowid to siftr(4).
Reviewed by:	lstewart
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D2089
2015-03-18 23:24:25 +00:00
Adrian Chadd
3b27278218 Correctly const-ify things.
Found by: clang 3.6
2015-03-18 04:40:36 +00:00
Ian Lepore
dcdeb95f09 Go back to using sbuf_new() with a preallocated large buffer, to avoid
triggering an sbuf auto-drain copyout while holding a lock.

Pointed out by:	   jhb
Pointy hat: 	   ian
2015-03-14 23:57:33 +00:00
Ian Lepore
751ccc429d Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl
string returned to userland is nulterminated.

PR:		195668
2015-03-14 18:11:24 +00:00
Andrey V. Elsukov
2530ed9e70 Fix `ipfw fwd tablearg'. Use dedicated field nh4 in struct table_value
to obtain IPv4 next hop address in tablearg case.

Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop
address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we
still use this opcode, but when packet identified as IPv6 packet, we
obtain next hop address from dedicated field nh6 in struct table_value.

Replace hopstore field in struct ip_fw_args with anonymous union and add
hopstore6 field. Use this field to copy tablearg value for IPv6.

Replace spare1 field in struct table_value with zoneid. Use it to keep
scope zone id for link-local IPv6 addresses. Since spare1 was used
internally, replace spare0 array with two variables spare0 and spare1.

Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting
IPv6 addresses in table_value. Use zoneid field in struct table_value
to store sin6_scope_id value.

Since the kernel still uses embedded scope zone id to represent
link-local addresses, convert next_hop6 address into this form before
return from pfil processing. This also fixes in6_localip() check
for link-local addresses.

Differential Revision:	https://reviews.freebsd.org/D2015
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-03-13 09:03:25 +00:00
Michael Tuexen
5ba11c4c2e Update a comment to get it aligned with the code change.
Reported by:	brueffer@
2015-03-11 15:40:29 +00:00
Michael Tuexen
975c975bf0 It seems that sb_acc is a better replacement for sb_cc than sb_ccc. At
least it unbreaks the use of select() for SCTP sockets.

MFC after: 3 days
2015-03-11 15:21:39 +00:00
Michael Tuexen
b3bf169ac7 Fix the adaptation of the path state when thresholds are changed
using the SCTP_PEER_ADDR_THLDS socket option.

MFC after: 3 days
2015-03-11 14:25:23 +00:00
Michael Tuexen
3cb3567d7e Keep track on the socket lock state. This fixes a bug showing up on
Mac OS X.

MFC after: 3 days
2015-03-10 22:38:10 +00:00
Michael Tuexen
2bb7e77385 Unlock the stcb when using setsockopt() for the SCTP_PEER_ADDR_THLDS option.
MFC after: 3 days
2015-03-10 21:05:17 +00:00
Michael Tuexen
59b6d5be4e Add a SCTP socket option to limit the cwnd for each path.
MFC after: 1 month
2015-03-10 19:49:25 +00:00
Michael Tuexen
8f290ed51b Fix a typo.
MFC after: 1 week
2015-03-10 09:16:31 +00:00
Julien Charbon
eb96dc3336 In TCP, connect() can return incorrect error code EINVAL
instead of EADDRINUSE or ECONNREFUSED

PR:			https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196035
Differential Revision:	https://reviews.freebsd.org/D1982
Reported by:		Mark Nunberg <mnunberg@haskalah.org>
Submitted by:		Harrison Grundy <harrison.grundy@astrodoggroup.com>
Reviewed by:		adrian, jch, glebius, gnn
Approved by:		jhb
MFC after:		2 weeks
2015-03-09 20:29:16 +00:00
Andrey V. Elsukov
1b5aa92cff lla_lookup() can directly call llentry_free() for static entries
and the last one requires to hold afdata's wlock.

PR:		197096
MFC after:	1 week
2015-03-07 18:33:08 +00:00
Hiroki Sato
11d8451df3 Implement Enhanced DAD algorithm for IPv6 described in
draft-ietf-6man-enhanced-dad-13.

This basically adds a random nonce option (RFC 3971) to NS messages
for DAD probe to detect a looped back packet.  This looped back packet
prevented DAD on some pseudo-interfaces which aggregates multiple L2 links
such as lagg(4).

The length of the nonce is set to 6 bytes.  This algorithm can be disabled by
setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis.

Reported by:		hiren
Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D1835
2015-03-02 17:30:26 +00:00
Hans Petter Selasky
9c0f6aa762 Fix a special case in ip_fragment() to produce a more sensible chain
of packets. When the data payload length excluding any headers, of an
outgoing IPv4 packet exceeds PAGE_SIZE bytes, a special case in
ip_fragment() can kick in to optimise the outgoing payload(s). The
code which was added in r98849 as part of zero copy socket support
assumes that the beginning of any MTU sized payload is aligned to
where a MBUF's "m_data" pointer points. This is not always the case
and can sometimes cause large IPv4 packets, as part of ping replies,
to be split more than needed.

Instead of iterating the MBUFs to figure out how much data is in the
current chain, use the value already in the "m_pkthdr.len" field of
the first MBUF in the chain.

Reviewed by:		ken @
Differential Revision:	https://reviews.freebsd.org/D1893
MFC after:		2 weeks
Sponsored by:		Mellanox Technologies
2015-02-25 13:58:43 +00:00
Xin LI
cfa498d88e Fix integer overflow in IGMP protocol.
Security:	FreeBSD-SA-15:04.igmp
Security:	CVE-2015-1414
Found by:	Mateusz Kocielski, Logicaltrust
Analyzed by:	Marek Kroemeke, Mateusz Kocielski (shm@NetBSD.org) and
		22733db72ab3ed94b5f8a1ffcde850251fe6f466
Submited by:	Mariusz Zaborski <oshogbo@FreeBSD.org>
Reviewed by:	bms
2015-02-25 05:42:59 +00:00
Zbigniew Bodek
8018ac153f Change struct attribute to avoid aligned operations mismatch
Previous __alignment(4) allowed compiler to assume that operations are
performed on aligned region. On ARM processor, this led to alignment fault
as shown below:
trapframe: 0xda9e5b10
FSR=00000001, FAR=a67b680e, spsr=60000113
r0 =00000000, r1 =00000068, r2 =0000007c, r3 =00000000
r4 =a67b6826, r5 =a67b680e, r6 =00000014, r7 =00000068
r8 =00000068, r9 =da9e5bd0, r10=00000011, r11=da9e5c10
r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
<...>
udp_input+0x264: ldmia r5, {r0-r3, r6}
udp_input+0x268: stmia r12, {r0-r3, r6}

This was due to instructions which do not support unaligned access,
whereas for __alignment(2) compiler replaced ldmia/stmia with some
logically equivalent memcpy operations.
In fact, the assumption that 'struct ip' is always 4-byte aligned
is definitely false, as we have no impact on data alignment of packet
stream received.

Another possible solution would be to explicitely perform memcpy()
on objects of 'struct ip' type, which, however, would suffer from
performance drop, and be merely a problem hiding.

Please, note that this has nothing to do with
ARM32_DISABLE_ALIGNMENT_FAULTS option, but is related strictly to
compiler behaviour.

Submitted by:  Wojciech Macek <wma@semihalf.com>
Reviewed by:   glebius, ian
Obtained from: Semihalf
2015-02-24 12:57:03 +00:00
Gleb Smirnoff
26d50672d6 The last userland piece of in_var.h is now 'struct in_aliasreq'. Move
it to the top of the file, and ifdef _KERNEL the rest.
2015-02-19 23:59:27 +00:00
Gleb Smirnoff
e072c794ad Now that all users of _WANT_IFADDR are fixed, remove this crutch and
hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 23:16:10 +00:00
Gleb Smirnoff
0d159406b6 - Rename 'struct igmp_ifinfo' into 'struct igmp_ifsoftc', since it really
represents a context.
- Preserve name 'struct igmp_ifinfo' for a new structure, that will be stable
  API between userland and kernel.
- Make sysctl_igmp_ifinfo() return the new 'struct igmp_ifinfo', instead of
  old one, which had a bunch of internal kernel structures in it.
- Move all above declarations from in_var.h to igmp_var.h, since they are
  private to IGMP code.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 22:35:23 +00:00
Konstantin Belousov
0c6dcac369 Fix build with KTR after r278978. 2015-02-19 15:41:23 +00:00
Gleb Smirnoff
fd1b2a7c57 Widen _KERNEL ifdef to hide more kernel network stack structures from userland. 2015-02-19 06:24:27 +00:00
Gleb Smirnoff
058e08bea9 Use new struct mbufq instead of struct ifqueue to manage packet queues in
IPv4 multicast code.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 01:21:02 +00:00
Randall Stewart
2575fbb827 This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Commented upon by hiren and sbruno
See Phabricator D1777 for more details.

Commented upon by hiren and sbruno
Reviewed by:	adrian, jhb and bz
Sponsored by:	Netflix Inc.
2015-02-09 19:28:11 +00:00
Hans Petter Selasky
609752f04f The flowid and hashtype should be copied from the originating packet
when fragmenting IP packets to preserve the order of the packets in a
stream. Else the resulting fragments can be sent out of order when the
hardware supports multiple transmit rings.

Reviewed by:	glebius @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-02-02 17:32:50 +00:00
Hiren Panchasara
ec446b1375 Make syncookie_mac() use 'tcp_seq irs' in computing hash.
This fixes what seems like a simple oversight when the function was added in
r253210.

Reported by:            Daniel Borkmann <dborkman@redhat.com>
                        Florian Westphal <fw@strlen.de>
Differential Revision:  https://reviews.freebsd.org/D1628
Reviewed by:            gnn
MFC after:              1 month
Sponsored by:           Limelight Networks
2015-01-30 17:29:07 +00:00
Michael Tuexen
aec9ef9745 Whitespace change. 2015-01-27 21:30:24 +00:00
Xin LI
6a58f0e913 Fix SCTP stream reset vulnerability.
We would like to acknowledge Gerasimos Dimitriadis who reported
the issue and Michael Tuexen who analyzed and provided the
fix.

Security:	FreeBSD-SA-15:03.sctp
Security:	CVE-2014-8613
Submitted by:	tuexen
2015-01-27 19:35:38 +00:00
Xin LI
38f2a43815 Fix SCTP SCTP_SS_VALUE kernel memory corruption and disclosure vulnerability.
We would like to acknowledge Clement LECIGNE from Google Security Team and
Francisco Falcon from Core Security Technologies who discovered the issue
independently and reported to the FreeBSD Security Team.

Security:	FreeBSD-SA-15:02.kmem
Security:	CVE-2014-8612
Submitted by:	tuexen
2015-01-27 19:35:36 +00:00
John Baldwin
002d455873 Use an sbuf to generate the output of the net.inet.tcp.hostcache.list
sysctl to avoid a possible buffer overflow if the cache grows while the
text is being generated.

PR:		172675
MFC after:	2 weeks
2015-01-25 19:45:44 +00:00