Commit graph

4762 commits

Author SHA1 Message Date
Mark Johnston
70fd40edb8 debugnet: Fix an error handling bug in the DDB command tokenizer
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit c262d5e877)
2022-06-30 10:12:15 -04:00
Mark Johnston
533a247fa8 debugnet: Handle batches of packets from if_input
Some drivers will collect multiple mbuf chains, linked by m_nextpkt,
before passing them to upper layers.  debugnet_pkt_in() didn't handle
this and would process only the first packet, typically leading to
retransmits.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 8414331481)
2022-06-30 10:11:52 -04:00
Mark Johnston
2cecf3cfbb bpf: Correct a comment
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit c88f6908b4)
2022-06-27 10:11:20 -04:00
Mark Johnston
18c53b8dde bpf: Zero pad bytes preceding BPF headers
BPF headers are word-aligned when copied into the store buffer.  Ensure
that pad bytes following the preceding packet are cleared.

Reported by:	KMSAN
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 60b4ad4b6b)
2022-06-27 10:11:10 -04:00
Warner Losh
db761c6a64 Create wrapper for Giant taken for newbus
Create a wrapper for newbus to take giant and for busses to take it too.
bus_topo_lock() should be called before interacting with newbus routines
and unlocked with bus_topo_unlock(). If you need the topology lock for
some reason, bus_topo_mtx() will provide that.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D31831

(cherry picked from commit c6df6f5322)
2022-06-21 17:13:20 +02:00
Gordon Bergling
8776e36b83 if_bridge(4): Fix a typo in a source code comment
- s/accross/across/

(cherry picked from commit f7faa4ad48)
2022-06-10 14:28:01 +02:00
Gordon Bergling
7052ecb806 if_llatbl: Fix a typo in a debug statement
- s/droped/dropped/

Obtained from:	NetBSD

(cherry picked from commit 4f493559b0)
2022-06-10 14:27:33 +02:00
Arnaud Ysmal
1315446ba0 LACP: Do not wait response for marker messages not sent
The error returned when a marker message can not be emitted on a port is not handled.

This cause the lacp to block all emissions until the timeout of 3 seconds is reached.

To fix this issue, I just clear the LACP_PORT_MARK flag when the packet could not be emitted.

Differential revision:	https://reviews.freebsd.org/D30467
Obtained from:		Stormshield

(cherry picked from commit 0b92a7fe47)
2022-06-07 05:57:29 +02:00
Andrey V. Elsukov
067a4b656b [vlan + lagg] add IFNET_EVENT_UPDATE_BAUDRATE event
use it to update if_baudrate for vlan interfaces created on the LACP lagg.

Differential revision:	https://reviews.freebsd.org/D33405

(cherry picked from commit f2ab916084)
2022-06-03 06:48:31 +02:00
Kristof Provost
0b666a7c13 net: remove incorrect assertions
This assertion relies on the 80e60e236d change ("ifnet: make if_index
global"), which is not present in stable/13.

This fixes the LINT build (and any configuration with INVARIANTS)

Reported by:    Dimitry Andric <dim@FreeBSD.org>
2022-05-31 12:02:01 +02:00
Kristof Provost
4dfd3ffc44 if: avoid interface destroy race
When we destroy an interface while the jail containing it is being
destroyed we risk seeing a race between if_vmove() and the destruction
code, which results in us trying to move a destroyed interface.

Protect against this by using the ifnet_detach_sxlock to also covert
if_vmove() (and not just detach).

PR:		262829
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D34704

(cherry picked from commit 868bf82153)
2022-05-27 18:25:10 +02:00
Mitchell Horne
eda5293cd7 debugnet: fix an errant assertion
We may call debugnet_free() before g_debugnet_pcb_inuse is true,
specifically in the cases where the interface is down or does not
support debugnet. pcb->dp_drv_input is used to hold the real driver
if_input callback while debugnet is in use, so we can check the status
of this field in the assertion.

This can be triggered trivially by trying to configure netdump on an
unsupported interface at the ddb prompt.

Initializing the dp_drv_input field to NULL explicitly is not necessary
but helps display the intent.

PR:		263929
Reported by:	Martin Filla <freebsd@sysctl.cz>
Reviewed by:	cem, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35179

(cherry picked from commit a84bf5eaa1)
2022-05-27 13:22:48 -03:00
Kurosawa Takahiro
94cea2fc07 rtsock: fix a stack overflow
struct sockaddr is not sufficient for buffer that can hold any
sockaddr_* structure. struct sockaddr_storage should be used.

Test:
ifconfig epair create
ifconfig epair0a inet6 add 2001:db8::1 up
ndp -s 2001:db8::2 02:86:98:2e:96:0b proxy # this triggers kernel stack overflow

Reviewed by:	markj, kp
Differential Revision:	https://reviews.freebsd.org/D35188

(cherry picked from commit 9573cc3555)
2022-05-25 10:13:34 +02:00
Kristof Provost
08135bd1fa epair: unbind prior to returning to userspace
If 'options RSS' is set we bind the epair tasks to different CPUs. We
must take care to not keep the current thread bound to the last CPU when
we return to userspace.

MFC after:	1 week
Sponsored by:	Orange Business Services

(cherry picked from commit cbbce42345)
2022-05-14 11:10:47 +02:00
Kristof Provost
7660a72217 epair: fix set but not used warning
If 'options RSS' is set.

MFC after:	1 week
Sponsored by:	Orange Business Services

(cherry picked from commit a6b0c8d04d)
2022-05-14 11:10:29 +02:00
John Baldwin
3e7bd3391f iflib: Cast the result of iflib_netmap_txq_init() to void.
This fixes a warning from GCC for kernels without netmap since the
return value is never used.

Reviewed by:	vmaffione, erj
Differential Revision:	https://reviews.freebsd.org/D28598

(cherry picked from commit 2ccf971ace)
2022-05-10 13:59:07 -07:00
Greg Foster
3fbee9be25 lacp: short timeout erroneously declares link-flapping
Panasas was seeing a higher-than-expected number of link-flap events.
After joint debugging with the switch vendor, we determined there were
problems on both sides; either of which might cause the occasional
event, but together caused lots of them.

On the switch side, an internal queuing issue was causing LACP PDUs --
which should be sent every second, in short-timeout mode -- to sometimes
be sent slightly later than they should have been. In some cases, two
successive PDUs were late, but we never saw three late PDUs in a row.

On the FreeBSD side, we saw a link-flap event every time there were two
late PDUs, while the spec says that it takes *three* seconds of downtime
to trigger that event. It turns out that if a PDU was received shortly
before the timer code was run, it would decrement less than a full
second after the PDU arrived. Then two delayed PDUs would cause two
additional decrements, causing it to reach zero less than three seconds
after the most-recent on-time PDU.

The solution is to note the time a PDU arrives, and only decrement if at
least a full second has elapsed since then.

Reported by:	Greg Foster <gfoster@panasas.com>
Reviewed by:	gallatin
Tested by:	Greg Foster <gfoster@panasas.com>
MFC after:	3 days
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D35070

(cherry picked from commit 00a80538b4)
2022-05-01 12:16:18 -07:00
Mark Johnston
421c2f93a4 net: Fix memory leaks in lltable_calc_llheader() error paths
Also convert raw epoch_call() calls to lltable_free_entry() calls, no
functional change intended.  There's no need to asynchronously free the
LLEs in that case to begin with, but we might as well use the lltable
interfaces consistently.

Noticed by code inspection; I believe lltable_calc_llheader() failures
do not generally happen in practice.

Reviewed by:	bz
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 990a6d18b0)
2022-04-15 10:21:20 -04:00
Gordon Bergling
7d7efb85cc net(4): Fix a typo in a source code comment
- s/accomodate/accommodate/

(cherry picked from commit d792dc7ebb)
2022-04-09 08:16:21 +02:00
Gordon Bergling
03c693982f net(3): Fix a typo in a source code comment
- s/paramenters/parameters/

(cherry picked from commit 23677398ca)
2022-04-09 08:12:28 +02:00
Gordon Bergling
64ddf8b65c net(3): Fix a typo in a source code comment
- s/Multilik/Multilink/

Obtained from:	NetBSD

(cherry picked from commit f8d292b665)
2022-04-09 08:10:23 +02:00
Gordon Bergling
c6ceb47932 net(3): Fix a typo in a source code comment
- s/verion/version/

(cherry picked from commit cba46da538)
2022-04-09 08:09:25 +02:00
Gordon Bergling
9d109792c0 vxlan(4): Fix two typos in sysctl descriptions
- s/fowarding/forwarding/

(cherry picked from commit bef80a7285)
2022-04-02 15:33:48 +02:00
Ed Maste
85f3c0a1c3 Fix kernel build without INET6
Reported by:	Gary Jennejohn
Fixes:		ff3a85d324 ("[lltable] Add per-family lltable ...")
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 818952c638)
2022-03-30 17:50:42 +00:00
Ed Maste
1c487a5d3d Fix kernel build without INET and INET6
Reviewed by:	brooks, melifaro
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33718

(cherry picked from commit a6668e31aa)
2022-03-30 17:50:42 +00:00
Gordon Bergling
f23ba13a58 pf(4): Fix a typo in a source code comment
- s/seaching/searching/

(cherry picked from commit ef88adc527)
2022-03-30 18:36:06 +02:00
Mateusz Guzik
229bb65656 pf: add PF_UNLNKDRULES_ASSERT
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")

(cherry picked from commit f11b6505f1)
2022-03-28 11:38:23 +00:00
Alexander V. Chernikov
eb22b73358 routing: Add unified level-based logging support for the routing subsystem.
Summary: MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D33664

(cherry picked from commit 63f7f3921b)
2022-03-28 08:48:12 +00:00
Alexander V. Chernikov
519bdbb448 nhops: split nh_family into nh_upper_family and nh_neigh_family.
With IPv4 over IPv6 nexthops and IP->MPLS support, there is a need
 to distingush "upper" e.g. traffic family and "neighbor" e.g. LLE/gateway
 address family. Store them explicitly in the private part of the nexthop data.

While here, store nhop fibnum in nhop_prip datastructure to make it self-contained.

MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D33663

(cherry picked from commit 823a08d740)
2022-03-28 08:47:52 +00:00
Alexander V. Chernikov
ffcca53561 [lltable] Add per-family lltable getters.
Introduce a new function, lltable_get(), to retrieve lltable pointer
 for the specified interface and family.
Use it to avoid all-iftable list traversal when adding or deleting
 ARP/ND records.

Differential Revision: https://reviews.freebsd.org/D33660
MFC after:	2 weeks

(cherry picked from commit ff3a85d324)
2022-03-28 08:47:38 +00:00
Kristof Provost
f6138d93b5 if_epair: build fix
66acf7685b failed to build on riscv (and mips). This is because the
atomic_testandset_int() (and friends) functions do not exist there.
Happily those platforms do have the long variant, so switch to that.

PR:		262571
MFC after:	3 days

(cherry picked from commit 0bf7acd6b7)
2022-03-20 01:25:03 +01:00
Michael Gmelin
bb9ad300f0 if_epair: fix race condition on multi-core systems
As an unwanted side effect of the performance improvements in
24f0bfbad5, epair interfaces stop forwarding traffic on higher
load levels when running on multi-core systems.

This happens due to a race condition in the logic that decides when to
place work in the task queue(s) responsible for processing the content
of ring buffers.

In order to fix this, a field named state is added to the epair_queue
structure. This field is used by the affected functions to signal each
other that something happened in the underlying ring buffers that might
require work to be scheduled in task queue(s), replacing the existing
logic, which relied on checking if ring buffers are empty or not.

epair_menq() does:
  - set BIT_MBUF_QUEUED
  - queue mbuf
  - if testandset BIT_QUEUE_TASK:
      enqueue task

epair_tx_start_deferred() does:
  - swap ring buffers
  - process mbufs
  - clear BIT_QUEUE_TASK
  - if testandclear BIT_MBUF_QUEUED
      enqueue task

PR:		262571
Approved by:    re (gjb, early MFC)
Reported by:	Johan Hendriks <joh.hendriks@gmail.com>
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D34569

(cherry picked from commit 66acf7685b)
2022-03-17 00:38:33 +01:00
Santiago Martinez
20ea94a9ec if_epair: fix build with RSS and INET or INET6 disabled
Reviewed by:	kp
MFC after:	1 week

(cherry picked from commit 52bcdc5b80)
2022-03-10 09:51:41 +01:00
Eric Joyner
bde1cafb7c
iflib: Allow drivers to determine which queue to TX on
Adds a new function pointer to struct if_txrx in order to allow
drivers to set their own function that will determine which queue
a packet should be sent on.

Since this includes a kernel ABI change, bump the __FreeBSD_version
as well.

(This motivation behind this is to allow the driver to examine the
UP in the VLAN tag and determine which queue to TX on based on
that, in support of HW TX traffic shaping.)

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	kbowling@, stallamr@netapp.com
Tested by:	jeffrey.e.pieper@intel.com
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D31485

(cherry picked from commit 213e91399b)
2022-03-07 16:10:27 -08:00
Ryan Stone
617729314a Fix ifa refcount leak in ifa_ifwithnet()
In 4f6c66cc9c, ifa_ifwithnet() was changed to no longer
ifa_ref() the returned ifaddr, and instead the caller was required
to stay in the net_epoch for as long as they wanted the ifaddr
to remain valid.  However, this missed the case where an AF_LINK
lookup would call ifaddr_byindex(), which still does ifa_ref()
the ifaddr.  This would cause a refcount leak.

Fix this by inlining the relevant parts of ifaddr_byindex() here,
with the ifa_ref() call removed.  This also avoids an unnecessary
entry and exit from the net_epoch for this case.

I've audited all in-tree consumers of ifa_ifwithnet() that could
possibly perform an AF_LINK lookup and confirmed that none of them
will expect the ifaddr to have a reference that they need to
release.

MFC after: 2 months
Sponsored by: Dell Inc
Differential Revision:	https://reviews.freebsd.org/D28705
Reviewed by: melifaro

(cherry picked from commit 5adea417d4)
2022-03-07 12:41:40 -05:00
Kristof Provost
226bb05ebc bridge: Don't share broadcast packets
if_bridge duplicates broadcast packets with m_copypacket(), which
creates shared packets. In certain circumstances these packets can be
processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as
part of the checksum verification. That may lead to incorrect packets
being transmitted.

Use m_dup() to create independent mbufs instead.

Reported by:	Richard Russo <toast@ruka.org>
Reviewed by:	donner, afedorov
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D34319

(cherry picked from commit 36637dd19d)
2022-02-28 16:38:05 +01:00
Kristof Provost
2e0bee4c7f if_epair: implement fanout
Allow multiple cores to be used to process if_epair traffic. We do this
(if RSS is enabled) based on the RSS hash of the incoming packet. This
allows us to distribute the load over multiple cores, rather than
sending everything to the same one.

We also switch from swi_sched() to taskqueues, which also contributes to
better throughput.

Benchmark results:
With net.isr.maxthreads=-1

Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1)

Before          627 Kpps
After (no RSS)  1.198 Mpps
After (RSS)     3.148 Mpps

Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1)

Before          7.705 Kpps
After (no RSS)  1.017 Mpps
After (RSS)     2.083 Mpps

MFC after:	3 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D33731

(cherry picked from commit 24f0bfbad5)
2022-02-23 16:39:04 +01:00
Aleksandr Fedorov
543b492fa5 if_vxlan(4): Allow netmap_generic to intercept RX packets.
Netmap (generic) intercepts the if_input method to handle RX packets.

Call ifp->if_input() instead of netisr_dispatch().
Add stricter check for incoming packet length.

This change is very useful with bhyve + vale + if_vxlan.

Reviewed by:	vmaffione (mentor), kib, np, donner
Approved by:	vmaffione (mentor), kib, np, donner
MFC after:	2 weeks
Sponsored by:	vstack.com
Differential Revision:	https://reviews.freebsd.org/D30638

(cherry picked from commit ceaf442ff2)
2022-02-23 14:01:20 +03:00
Vincenzo Maffione
b425101ab6 netmap: fix LOR in iflib_netmap_register
In iflib_device_register(), the CTX_LOCK is acquired first and then
IFNET_WLOCK is acquired by ether_ifattach(). However, in netmap_hw_reg()
we do the opposite: IFNET_RLOCK is acquired first, and then CTX_LOCK
is acquired by iflib_netmap_register(). Fix this LOR issue by wrapping
the CTX_LOCK/UNLOCK calls in iflib_device_register with an additional
IFNET_WLOCK. This is safe since the IFNET_WLOCK is recursive.

MFC after:	1 month

(cherry picked from commit e0e1240528)
2022-02-13 10:19:26 +00:00
Kristof Provost
c6745a0cc4 pflog: align header to 4 bytes, not 8
6d4baa0d01 incorrectly rounded the lenght of the pflog header up to 8
bytes, rather than 4.

PR:		261566
Reported by:	Guy Harris <gharris@sonic.net>
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")

(cherry picked from commit 4daa31c108)
2022-02-09 10:40:58 +01:00
Kristof Provost
0f7841b31c pf: make if_pflog.h self-contained
Reviewed by:	imp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33503

(cherry picked from commit dc04fa802d)
2022-02-09 10:40:58 +01:00
Ed Maste
94e6d14488 Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 9feff969a0)
2022-02-08 15:00:55 -05:00
Mark Johnston
a409ae5837 pf: Initialize pf_kpool mutexes earlier
There are some error paths in ioctl handlers that will call
pf_krule_free() before the rule's rpool.mtx field is initialized,
causing a panic with INVARIANTS enabled.

Fix the problem by introducing pf_krule_alloc() and initializing the
mutex there.  This does mean that the rule->krule and pool->kpool
conversion functions need to stop zeroing the input structure, but I
don't see a nicer way to handle this except perhaps by guarding the
mtx_destroy() with a mtx_initialized() check.

Constify some related functions while here and add a regression test
based on a syzkaller reproducer.

Reported by:	syzbot+77cd12872691d219c158@syzkaller.appspotmail.com
Reviewed by:	kp
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 773e3a71b2)
2022-02-07 09:10:16 -05:00
Kristof Provost
650607380c pf: protect the rpool from races
The roundrobin pool stores its state in the rule, which could
potentially lead to invalid addresses being returned.

For example, thread A just executed PF_AINC(&rpool->counter) and
immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter)
(i.e. after the pf_match_addr() check of rpool->counter).

Lock the rpool with its own mutex to prevent these races. The
performance impact of this is expected to be low, as each rule has its
own lock, and the lock is also only relevant when state is being created
(so only for the initial packets of a connection, not for all traffic).

See also:	https://redmine.pfsense.org/issues/12660
Reviewed by:	glebius
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33874

(cherry picked from commit 5f5e32f1b3)
2022-02-04 11:37:14 +01:00
Stefan Eßer
dc4114875e Make CPU_SET macros compliant with other implementations
(cherry picked from commit e2650af157)
2022-01-14 18:17:30 +02:00
Vincenzo Maffione
6000a417d3 net: iflib: sync isc_capenable to if_capenable
On SIOCSIFCAP, some bits in ifp->if_capenable may be toggled.
When this happens, apply the same change to isc_capenable, which
is the iflib private copy of if_capenable (for a subset of the
IFCAP_* bits). In this way the iflib drivers can check the bits
using isc_capenable rather than if_capenable. This is convenient
because the latter access requires an additional indirection
through the ifp, and it is also less likely to be in cache.

PR:		260068
Reviewed by:	kbowling, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33156

(cherry picked from commit 4561c4f0ca)
2022-01-06 09:53:08 +00:00
Kristof Provost
80c2f5fc0a if_pflog: fix packet length
There were two issues with the new pflog packet length.
The first is that the length is expected to be a multiple of
sizeof(long), but we'd assumed it had to be a multiple of
sizeof(uint32_t).

The second is that there's some broken software out there (such as
Wireshark) that makes incorrect assumptions about the amount of padding.
That is, Wireshark assumes there's always three bytes of padding, rather
than however much is needed to get to a multiple of sizeof(long).

Fix this by adding extra padding, and a fake field to maintain
Wireshark's assumption.

Reported by:	Ozkan KIRIK <ozkan.kirik@gmail.com>
Tested by:	Ozkan KIRIK <ozkan.kirik@gmail.com>
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33236

(cherry picked from commit 6d4baa0d01)
2021-12-11 10:38:50 +01:00
Zhenlei Huang
5346a9a2e0 if_epair: Also mark the flag of pair b with IFF_KNOWSEPOCH
Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33210

(cherry picked from commit 73d41cc730)
2021-12-11 10:38:17 +01:00
Andriy Gapon
5f24d2a82c iflib_stop: drain rx tasks to prevent any data races
iflib_stop modifies iflib data structures that are used by _task_fn_rx,
most prominently the free lists.  So, iflib_stop has to ensure that the
rx task threads are not active.

This should help to fix a crash seen when iflib_if_ioctl (e.g.,
SIOCSIFCAP) is called while there is already traffic flowing.

The crash has been seen on VMWare guests with vmxnet3 driver.

My guess is that on physical hardware the couple of 1ms delays that
iflib_stop has after disabling interrupts are enough for the queued work
to be completed before any iflib state is touched.

But on busy hypervisors the guests might not get enough CPU time to
complete the work, thus there can be a race between the taskqueue
threads and the work done to handle an ioctl, specifically in iflib_stop
and iflib_init_locked.

PR:		259458

(cherry picked from commit 1bfdb812c7)
2021-12-10 14:32:37 +02:00
Alexander V. Chernikov
6ef62def60 routing: Use the same index space for both nexthop and nexthop groups.
This simplifies userland object handling along with kernel-level
 nexthop handling in fib algo framework.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D32342

(cherry picked from commit 7e64580b5f)
2021-12-04 19:03:05 +00:00