Commit graph

4718 commits

Author SHA1 Message Date
Stefan Eßer
dc4114875e Make CPU_SET macros compliant with other implementations
(cherry picked from commit e2650af157)
2022-01-14 18:17:30 +02:00
Vincenzo Maffione
6000a417d3 net: iflib: sync isc_capenable to if_capenable
On SIOCSIFCAP, some bits in ifp->if_capenable may be toggled.
When this happens, apply the same change to isc_capenable, which
is the iflib private copy of if_capenable (for a subset of the
IFCAP_* bits). In this way the iflib drivers can check the bits
using isc_capenable rather than if_capenable. This is convenient
because the latter access requires an additional indirection
through the ifp, and it is also less likely to be in cache.

PR:		260068
Reviewed by:	kbowling, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33156

(cherry picked from commit 4561c4f0ca)
2022-01-06 09:53:08 +00:00
Kristof Provost
80c2f5fc0a if_pflog: fix packet length
There were two issues with the new pflog packet length.
The first is that the length is expected to be a multiple of
sizeof(long), but we'd assumed it had to be a multiple of
sizeof(uint32_t).

The second is that there's some broken software out there (such as
Wireshark) that makes incorrect assumptions about the amount of padding.
That is, Wireshark assumes there's always three bytes of padding, rather
than however much is needed to get to a multiple of sizeof(long).

Fix this by adding extra padding, and a fake field to maintain
Wireshark's assumption.

Reported by:	Ozkan KIRIK <ozkan.kirik@gmail.com>
Tested by:	Ozkan KIRIK <ozkan.kirik@gmail.com>
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33236

(cherry picked from commit 6d4baa0d01)
2021-12-11 10:38:50 +01:00
Zhenlei Huang
5346a9a2e0 if_epair: Also mark the flag of pair b with IFF_KNOWSEPOCH
Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33210

(cherry picked from commit 73d41cc730)
2021-12-11 10:38:17 +01:00
Andriy Gapon
5f24d2a82c iflib_stop: drain rx tasks to prevent any data races
iflib_stop modifies iflib data structures that are used by _task_fn_rx,
most prominently the free lists.  So, iflib_stop has to ensure that the
rx task threads are not active.

This should help to fix a crash seen when iflib_if_ioctl (e.g.,
SIOCSIFCAP) is called while there is already traffic flowing.

The crash has been seen on VMWare guests with vmxnet3 driver.

My guess is that on physical hardware the couple of 1ms delays that
iflib_stop has after disabling interrupts are enough for the queued work
to be completed before any iflib state is touched.

But on busy hypervisors the guests might not get enough CPU time to
complete the work, thus there can be a race between the taskqueue
threads and the work done to handle an ioctl, specifically in iflib_stop
and iflib_init_locked.

PR:		259458

(cherry picked from commit 1bfdb812c7)
2021-12-10 14:32:37 +02:00
Alexander V. Chernikov
6ef62def60 routing: Use the same index space for both nexthop and nexthop groups.
This simplifies userland object handling along with kernel-level
 nexthop handling in fib algo framework.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D32342

(cherry picked from commit 7e64580b5f)
2021-12-04 19:03:05 +00:00
Alexander V. Chernikov
b9772822a6 routing: fix source address selection rules for IPv4 over IPv6.
Current logic always selects an IFA of the same family from the
 outgoing interfaces. In IPv4 over IPv6 setup there can be just
 single non-127.0.0.1 ifa, attached to the loopback interface.

Create a separate rt_getifa_family() to handle entire ifa selection
 for the IPv4 over IPv6.

Differential Revision: https://reviews.freebsd.org/D31868
MFC after:	1 week

(cherry picked from commit 4b631fc832)
2021-12-04 19:02:52 +00:00
Kristof Provost
563b1596fc if_stf: style(9) pass
As stated in style(9): "Values in return statements should be enclosed
in parentheses."

MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32848

(cherry picked from commit 3576121c8b)
2021-12-01 16:53:19 +01:00
Kristof Provost
2b0a7984fe if_stf: enable use in vnet jails
The cloner must be per-vnet so that cloned interfaces get destroyed when
the vnet goes away. Otherwise we fail assertions in vnet_if_uninit():

	panic: vnet_if_uninit:475 tailq &V_ifnet=0xfffffe01665fe070 not empty
	cpuid = 19
	time = 1636107064
	KDB: stack backtrace:
	db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015d0cac60
	vpanic() at vpanic+0x187/frame 0xfffffe015d0cacc0
	panic() at panic+0x43/frame 0xfffffe015d0cad20
	vnet_if_uninit() at vnet_if_uninit+0x7b/frame 0xfffffe015d0cad30
	vnet_destroy() at vnet_destroy+0x170/frame 0xfffffe015d0cad60
	prison_deref() at prison_deref+0x9b0/frame 0xfffffe015d0cadd0
	sys_jail_remove() at sys_jail_remove+0x119/frame 0xfffffe015d0cae00
	amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe015d0caf30
	fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015d0caf30
	--- syscall (508, FreeBSD ELF64, sys_jail_remove), rip = 0x8011e920a, rsp = 0x7fffffffe788, rbp = 0x7fffffffe810 ---
	KDB: enter: panic

MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32849

(cherry picked from commit 8e45fed3ae)
2021-12-01 16:53:19 +01:00
Kristof Provost
60c3b9a78a if_gif: fix vnet shutdown panic
If an if_gif exists and has an address assigned inside a vnet when the
vnet is shut down we failed to clean up the address, leading to a panic
when we ip_destroy() and the V_in_ifaddrhashtbl is not empty.

This happens because of the VNET_SYS(UN)INIT order, which means we
destroy the if_gif interface before the addresses can be purged (and
if_detach() does not remove addresses, it assumes this will be done by
the stack teardown code).

Set subsystem SI_SUB_PSEUDO just like if_bridge so the cleanup
operations happen in the correct order.

MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32835

(cherry picked from commit 8ca6c11a7c)
2021-11-29 15:44:39 +01:00
Kristof Provost
029aed9281 lagg: fix unused-but-set-variable
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")

(cherry picked from commit 3142d4f622)
2021-11-26 04:40:56 +01:00
Kristof Provost
cfe9b890d5 pf: Introduce ridentifier
Allow users to set a number on rules which will be exposed as part of
the pflog header.
The intent behind this is to allow users to correlate rules across
updates (remember that pf rules continue to exist and match existing
states, even if they're removed from the active ruleset) and pflog.

Obtained from:	pfSense
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32750

(cherry picked from commit 76c5eecc34)
2021-11-26 04:39:05 +01:00
Bjoern A. Zeeb
f4aba8c9f0 if_epair: rework
Rework if_epair(4) to no longer use netisr and dpcpu.
Instead use mbufq and swi_net.
This simplifies the code and seems to make it work better and
no longer hang.

Work largely by bz@, with minor tweaks by kp@.

Reviewed by:	bz, kp
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D31077

(cherry picked from commit 3dd5760aa5)
2021-11-23 16:50:51 +01:00
Kristof Provost
6d20b6de6a if_epair: delete mbuf tags
Remove all (non-persistent) tags when we transmit a packet. Real network
interfaces do not carry any tags either, and leaving tags attached can
produce unexpected results.

Reviewed by:	bz, glebius
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32663

(cherry picked from commit 62d2dcafb7)
2021-11-19 06:51:58 +01:00
Bjoern A. Zeeb
9806b36dee epair: remove "All rights reserved"
Remove "All rights reserved" from The FreeBSD Foundation owned
copyrights on epair code and documentation.

Approved by:	emaste (FreeBSD Foundation)

(cherry picked from commit 1a8f198fa6)
2021-11-19 00:01:27 +00:00
Kristof Provost
00ff2b29a9 pf: remove unused field from pf_kanchor
The 'match' field is only used in the userspace version of the struct
(pf_anchor).

MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")

(cherry picked from commit 76c2e71c4c)
2021-10-22 09:34:08 +02:00
Kristof Provost
c5a340e864 pfctl: userspace adaptive syncookies configration
Hook up the userspace bits to configure syncookies in adaptive mode.

MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D32136

(cherry picked from commit 5062afff9d)
2021-10-06 10:46:54 +02:00
Kristof Provost
dc23abfdea pf: implement adaptive mode
Use atomic counters to ensure that we correctly track the number of half
open states and syncookie responses in-flight.
This determines if we activate or deactivate syncookies in adaptive
mode.

MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D32134

(cherry picked from commit bf8637181a)
2021-10-06 10:46:53 +02:00
Vincenzo Maffione
d61f959961 netmap: monitor: add a flag to distinguish packet direction
The netmap monitor intercepts any TX/RX packets on the monitored
port. However, before this change there was no way to tell
whether an intercepted packet was being transmitted or received
on the monitored port.
A TXMON flag in the netmap slot has been added for this purpose.

(cherry picked from commit 660a47cb99)
2021-09-26 14:00:04 +00:00
Mark Johnston
7698572910 debugnet: Include some required headers
Don't depend on pollution from net/vnet.h.

PR:		258496
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit b1746faad6)
2021-09-21 09:37:42 -04:00
Kristof Provost
d073473bc6 pf: qid and pqid can be uint16_t
tag2name() returns a uint16_t, so we don't need to use uint32_t for the
qid (or pqid). This reduces the size of struct pf_kstate slightly. That
in turn buys us space to add extra fields for dummynet later.

Happily these fields are not exposed to user space (there are user space
versions of them, but they can just stay uint32_t), so there's no ABI
breakage in modifying this.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31873

(cherry picked from commit b64f7ce98f)
2021-09-17 17:55:42 +02:00
Mark Johnston
cdb6f3d416 net: Enter a net epoch around protocol if_up/down notifications
When traversing a list of interface addresses, we need to be in a net
epoch section, and protocol ctlinput routines need a stable reference to
the address.

Reported by:	syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com
Reviewed by:	kp, melifaro
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit b1e6a792d6)
2021-09-17 09:13:09 -04:00
Kristof Provost
8aeafec21a pf: remove unused function prototype
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")

(cherry picked from commit bb25e36e13)
2021-09-14 22:00:24 +02:00
Kristof Provost
9136dfca19 pf: Add counters for syncookies
Count when we send a syncookie, receive a valid syncookie or detect a
synflood.

Reviewed by:	kbowling
MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31713

(cherry picked from commit 4cab80a8df)
2021-09-08 09:28:14 +02:00
Alexander V. Chernikov
f3d6900337 routing: Bring back the ability to specify transmit interface via its name.
Some software references outgoing interfaces by specifying name instead of
 index.

Use rti_ifp from rt_addrinfo if provided instead of always using
 address interface when constructing nexthop.

PR: 		255678
Reported by:	martin.larsson2 at gmail.com

(cherry picked from commit d98954e229)
2021-09-07 21:25:24 +00:00
Zhenlei Huang
e8df60a69a routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).
Implement kernel support for RFC 5549/8950.

* Relax control plane restrictions and allow specifying IPv6 gateways
 for IPv4 routes. This behavior is controlled by the
 net.route.rib_route_ipv6_nexthop sysctl (on by default).

* Always pass final destination in ro->ro_dst in ip_forward().

* Use ro->ro_dst to exract packet family inside if_output() routines.
 Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.

* Pass extracted family to nd6_resolve() to get the LLE with proper encap.
 It leverages recent lltable changes committed in c541bd368f.

Presence of the functionality can be checked using ipv4_rfc5549_support feature(3).
Example usage:
  route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0

Differential Revision: https://reviews.freebsd.org/D30398

(cherry picked from commit 62e1a437f3)
2021-09-07 21:25:06 +00:00
Alexander V. Chernikov
e86f5d4fcb routing: Disallow zero nexthop weights in nexthop groups.
Adding such nexthops breaks calc_min_mpath_slots() assumptions,
 thus resulting in the incorrect nexthop group creation and
 eventually leading to panic.
Reported by:	avg

(cherry picked from commit 0a3a377aee)
2021-09-07 21:02:59 +00:00
Alexander V. Chernikov
8c73907c66 routing: simplify malloc flags in alloc_nhgrp().
(cherry picked from commit 639d7abec6)
2021-09-07 21:02:59 +00:00
Alexander V. Chernikov
0e77fc2a79 routing: Fix newly-added rt_get_inet[6]_parent() api.
Correctly handle the case when no default route is present.

Reported by:	Konrad <konrad.kreciwilk at korbank.pl>

(cherry picked from commit f84c30106e)
2021-09-07 21:02:59 +00:00
Alexander V. Chernikov
48f38f47b1 lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries.
Currently we use pre-calculated headers inside LLE entries as prepend data
 for `if_output` functions. Using these headers allows saving some
 CPU cycles/memory accesses on the fast path.

However, this approach makes adding L2 header for IPv4 traffic with IPv6
 nexthops more complex, as it is not possible to store multiple
 pre-calculated headers inside lle. Additionally, the solution space is
 limited by the fact that PCB caching saves LLEs in addition to the nexthop.

Thus, add support for creating special "child" LLEs for the purpose of holding
 custom family encaps and store mbufs pending resolution. To simplify handling
 of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE.
 Such LLEs are not visible when iterating LLE table. Their lifecycle is bound
 to the "parent" LLE - it is not possible to delete "child" when parent is alive.
 Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state
 machine used by the standard LLEs.

nd6_lookup() and nd6_resolve() now accepts an additional argument, family,
 allowing to return such child LLEs. This change uses `LLE_SF()` macro which
 packs family and flags in a single int field. This is done to simplify merging
 back to stable/. Once this code lands, most of the cases will be converted to
 use a dedicated `family` parameter.

Differential Revision: https://reviews.freebsd.org/D31379

(cherry picked from commit c541bd368f)
2021-09-07 21:02:58 +00:00
Alexander V. Chernikov
5007bc4e13 routing: Fix crashes with dpdk_lpm[46] algo.
When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know
 the nexthop of the "parent" prefix to update its internal state.
The glue code, which utilises RIB as a backing route store, uses
 fib[46]_lookup_rt() for the prefix destination after its deletion
 to fetch the desired nexthop.
This approach does not work when deleting less-specific prefixes
 with most-specific ones are still present. For example, if
 10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting
 10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search
 result instead of 10.0.0.0/22. This, in turn, results in the failed
 datastructure update: part of the deleted /23 prefix will still
 contain the reference to an old nexthop. This leads to the
 use-after-free behaviour, ending with the eventual crashes.

Fix the logic flaw by properly fetching the prefix "parent" via
 newly-created rt_get_inet[6]_parent() helpers.

Differential Revision: https://reviews.freebsd.org/D31546
PR:	256882,256833

(cherry picked from commit 36e15b717e)
2021-09-07 21:02:58 +00:00
Alexander V. Chernikov
10e0976103 Simplify nhop operations in ip_output().
Consistently use `nh` instead of always dereferencing
 ro->ro_nh inside the if block.
Always use nexthop mtu, as it provides guarantee that mtu is accurate.
Pass `nh` pointer to rt_update_ro_flags() to allow upcoming uses
 of updating ro flags based on different nexthop.

Differential Revision: https://reviews.freebsd.org/D31451
Reviewed by:	kp

(cherry picked from commit 9748eb7427)
2021-09-07 21:02:58 +00:00
Alexander V. Chernikov
4151d8ccdc [lltable] Restructure nd6 code.
Factor out lltable locking logic from lltable_try_set_entry_addr()
 into a separate lltable_acquire_wlock(), so the latter can be used
 in other parts of the code w/o duplication.

Create nd6_try_set_entry_addr() to avoid code duplication in nd6.c
 and nd6_nbr.c.

Move lle creation logic from nd6_resolve_slow() into a separate
 nd6_get_llentry() to simplify the former.

These changes serve as a pre-requisite for implementing
 RFC8950 (IPv4 prefixes with IPv6 nexthops).

Differential Revision: https://reviews.freebsd.org/D31432

(cherry picked from commit 0b79b007eb)
2021-09-07 21:02:58 +00:00
Alexander V. Chernikov
2802014380 [lltable] Unify datapath feedback mechamism.
Use newly-create llentry_request_feedback(),
 llentry_mark_used() and llentry_get_hittime() to
 request datapatch usage check and fetch the results
 in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
 by eliminating 1 condition check.

Differential Revision: https://reviews.freebsd.org/D31390

(cherry picked from commit f3a3b06121)
2021-09-07 21:02:58 +00:00
Alexander V. Chernikov
0f7162e0cd Fix typo in rib_unsibscribe<_locked>().
Submitted by:	Zhenlei Huang<zlei.huang at gmail.com>
Differential Revision: https://reviews.freebsd.org/D31356

(cherry picked from commit 5b42b494d5)
2021-09-07 21:02:57 +00:00
Alexander V. Chernikov
a27813e130 Enforce check for using the return result for ifa?_try_ref().
Suggested by:	hps
Differential Revision:	https://reviews.freebsd.org/D29504

(cherry picked from commit 9e5243d7b6)
2021-09-07 21:01:31 +00:00
Alexander V. Chernikov
efa8c43ed6 Rename variables inside nexhtop group consider_resize() code.
No functional changes.

(cherry picked from commit 0f30a36ded)
2021-09-07 21:01:31 +00:00
Alexander V. Chernikov
311cf25c24 Simplify ifa/ifp refcounting in the routing stack.
The routing stack control depends on quite a tree of functions to
 determine the proper attributes of a route such as a source address (ifa)
 or transmit ifp of a route.

When actually inserting a route, the stack needs to ensure that ifa and ifp
 points to the entities that are still valid.
Validity means slightly more than just pointer validity - stack need guarantee
 that the provided objects are not scheduled for deletion.

Currently, callers either ignore it (most ifp parts, historically) or try to
 use refcounting (ifa parts). Even in case of ifa refcounting it's not always
 implemented in fully-safe manner. For example, some codepaths inside
 rt_getifa_fib() are referencing ifa while not holding any locks, resulting in
 possibility of referencing scheduled-for-deletion ifa.

Instead of trying to fix all of the callers by enforcing proper refcounting,
 switch to a different model.
As the rib_action() already requires epoch, do not require any stability guarantees
 other than the epoch-provided one.
Use newly-added conditional versions of the refcounting functions
 (ifa_try_ref(), if_try_ref()) and fail if any of these fails.

Reviewed by:	donner
Differential Revision:	https://reviews.freebsd.org/D28837

(cherry picked from commit 5964172837)
2021-09-07 20:55:51 +00:00
Alexander V. Chernikov
04e967d727 Add if_try_ref() to simplify refcount handling inside epoch.
When we have an ifp pointer and the code is running inside epoch,
 epoch guarantees the pointer will not be freed.
However, the following case can still happen:

* in thread 1 we drop to refcount=0 for ifp and schedule its deletion.
* in thread 2 we use this ifp and reference it
* destroy callout kicks in
* unhappy user reports a bug

This can happen with the current implementation of ifnet_byindex_ref(),
 as we're not holding any locks preventing ifnet deletion by a parallel thread.

To address it, add if_try_ref(), allowing to return failure when
 referencing ifp with refcount=0.
Additionally, enforce existing if_ref() is with KASSERT to provide a
 cleaner error in such scenarios.

Finally, fix ifnet_byindex_ref() by using if_try_ref() and returning NULL
 if the latter fails.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28836

(cherry picked from commit 7563019bc6)
2021-09-07 20:55:51 +00:00
Kristof Provost
09cad040dc pf: Introduce nvlist variant of DIOCGETSTATUS
Make it possible to extend the GETSTATUS call (e.g. when we want to add
new counters, such as for syncookie support) by introducing an
nvlist-based alternative.

MFC after:	1 week
Sponsored by:   Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31694

(cherry picked from commit 2b10cf85f8)
2021-09-06 10:06:23 +02:00
George V. Neville-Neil
c850ca3a17 Retore the vnet before returning an error.
Obtained from:	Kanndula, Dheeraj <Dheeraj.Kandula@netapp.com>

(cherry picked from commit c6b2d024d7)
2021-09-05 18:25:44 -04:00
Luiz Otavio O Souza
cb5a0b7ca3 if_bridge: add ALTQ support
Similar to the recent addition of ALTQ support to if_vlan.

Reviewed by:	donner
Obtained from:	pfsense
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31675

(cherry picked from commit eb680a63de)
2021-09-01 15:27:01 +02:00
Luiz Otavio O Souza
984c87891b if_vlan: add the ALTQ support to if_vlan.
Inspired by the iflib implementation, allow ALTQ to be used with if_vlan
interfaces.

Reviewed by:	donner
Obtained from:	pfsense
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31647

(cherry picked from commit 2e5ff01d0a)
2021-09-01 15:27:00 +02:00
Kristof Provost
bd0ad8209d altq: Fix panics on rmc_restart()
rmc_restart() is called from a timer, but can trigger traffic. This
means the curvnet context will not be set.
Use the vnet associated with the interface we're currently processing to
set it. We also have to enter net_epoch here, for the same reason.

Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31642

(cherry picked from commit 159258afb5)
2021-08-30 10:02:14 +02:00
Luiz Otavio O Souza
df8824452d lagg: don't update link layer addresses on destroy
When the lagg is being destroyed it is not necessary update the
lladdr of all the lagg members every time we update the primary
interface.

Reviewed by:	scottl
Obtained from:	pfSense
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31586

(cherry picked from commit c138424148)
2021-08-26 14:05:27 +02:00
Patrick Kelsey
23740b9435 iflib: Improve mapping of TX/RX queues to CPUs
iflib now supports mapping each (TX,RX) queue pair to the same CPU
(default), to separate CPUs, or to a pair of physical and logical CPUs
that share the same L2 cache.  The mapping mechanism supports unequal
numbers of TX and RX queues, with the excess queues always being
mapped to consecutive physical CPUs.  When the platform cannot
distinguish between physical and logical CPUs, all are treated as
physical CPUs.  See the comment on get_cpuid_for_queue() for the
entire matrix.

The following device-specific tunables influence the mapping process:
dev.<device>.<unit>.iflib.core_offset       (existing)
dev.<device>.<unit>.iflib.separate_txrx     (existing)
dev.<device>.<unit>.iflib.use_logical_cores (new)

The following new, read-only sysctls provide visibility of the mapping
results:
dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu

When an iflib driver allocates TX softirqs without providing reference
RX IRQs, iflib now binds those TX softirqs to CPUs using the above
mapping mechanism (that is, treats them as if they were TX IRQs).
Previously, such bindings were left up to the grouptaskqueue code and
thus fell outside of the iflib CPU mapping strategy.

Reviewed by:	kbowling
Tested by:	olivier, pkelsey
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D24094

(cherry picked from commit ca7005f189)
2021-08-25 16:54:38 -07:00
Franco Fichtner
412a609da4 gre: simplify RSS ifdefs
Use the early break to avoid else definitions. When RSS gains a
runtime option previous constructs would duplicate and convolute
the existing code.

While here init flowid and skip magic numbers and late default
assignment.

Reviewed by:	melifaro, kbowling
Obtained from:	OPNsense
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31584

(cherry picked from commit bb250fae9e)
2021-08-25 16:52:35 -07:00
Stephan de Wit
8cb9af94ff iflib: emulate counters in netmap mode
When iflib devices are in netmap mode the driver
counters are no longer updated making it look from
userspace tools that traffic has stopped.

Reported by:	Franco Fichtner <franco@opnsense.org>
Reviewed by:	vmaffione, iflib (erj, gallatin)
Obtained from:	OPNsense
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31550

(cherry picked from commit 66fa12d8fb)
2021-08-25 16:51:42 -07:00
Kyle Evans
8a206a2265 kern: ether_gen_addr: randomize on default hostuuid, too
Currently, this will still hash the default (all zero) hostuuid and
potentially arrive at a MAC address that has a high chance of collision
if another interface of the same name appears in the same broadcast
domain on another host without a hostuuid, e.g., some virtual machine
setups.

Instead of using the default hostuuid, just treat it as a failure and
generate a random LA unicast MAC address.

Reviewed by:	bz, gbe, imp, kbowling, kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29788

(cherry picked from commit 2d741f33bd)
2021-08-20 14:32:31 -07:00
Alexander V. Chernikov
406a4b37a2 [multipath][nhops] Fix random crashes with high route churn rate.
When certain multipath route begins flapping really fast, it may
 result in creating multiple identical nexthop groups. The code
 responsible for unlinking unused nexthop groups had an implicit
 assumption that there could be only one nexthop group for the
 same combination of nexthops with weights. This assumption resulted
 in always unlinking the first "identical" group, instead of the
 desired one. Such action, in turn, produced a used-but-unlinked
 nhg along with freed-and-linked nhg, ending up in random crashes.

Similarly, it is possible that multiple identical nexthops gets
 created in the case of high route churn, resulting in the same
 problem when deleting one of such nexthops.

Fix by matching the nexthop/nexhop group pointer when deleting the item.

Reported by:	avg
(cherry picked from commit 054948bd81)
2021-08-17 21:14:42 +00:00