Rather than compute ilog2(roundup_pow_of_two(x)), which invokes ilog2
twice, just use order_base_2 once. And employ that optimization
twice.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D46838
Add vlan tag match to RX FS SA and policy rules
and report SA lifetime counter on vlan interface
in case SA was installed on vlan interface
Existing code didn't have the net tag id as part of
the FS matching rules. This can cause applying
ipsec offload to the wrong interface.
This commit add tag id as part of FS matchers
and treat tag value 0 as no tag
Sponsored by: NVidia networking
Do not prepend ipsec tags into mbuf head when preparing rx wqe, store it
separately. Only prepend (and clear the store) when received packed was
indeed offloaded by ipsec engine. Then we do not need to refill tags
for slots that received non-ipsec packets.
This should solve some minimal degradation of the rx CPU usage due to
unneeded tag allocation for each packet.
Sponsored by: NVidia networking
When the kernel is compiled with options RATELIMIT, the
mlx5en driver cannot detach. It gets stuck waiting for all
kernel users of its rates to drop to zero before finally calling
ether_ifdetach.
The tcp ratelimit code has an eventhandler for ifnet departure
which causes rates to be released. However, this is called as an
ifnet departure eventhandler, which is invoked as part of
ifdetach(), via either_ifdetach(). This means that the tcp
ratelimit code holds down many hw rates when the mlx5en driver
is waiting for the rate count to go to 0. Thus devctl detach
will deadlock on mlx5 with this stack:
mi_switch+0xcf sleepq_timedwait+0x2f _sleep+0x1a3 pause_sbt+0x77 mlx5e_destroy_ifp+0xaf mlx5_remove_device+0xa7 mlx5_unregister_device+0x78 mlx5_unload_one+0x10a remove_one+0x1e linux_pci_detach_device+0x36 linux_pci_detach+0x24 device_detach+0x180 devctl2_ioctl+0x3dc devfs_ioctl+0xbb vn_ioctl+0xca devfs_ioctl_f+0x1e kern_ioctl+0x1c3 sys_ioctl+0x10a
To fix this, provide an explicit API for a driver to call the tcp
ratelimit code telling it to detach itself from an ifnet. This
allows the mlx5 driver to unload cleanly. I considered adding an
ifnet pre-departure eventhandler. However, that would need to be
invoked by the driver, so a simple function call seemed better.
The mlx5en driver has been updated to call this function.
Reviewed by: kib, rrs
Differential Revision: https://reviews.freebsd.org/D46221
Sponsored by: Netflix
Right now, only IPv4 transport mode, with aes-gcm ESP, is supported.
Driver also cooperates with NAT-T, and obeys socket policies, which
makes IKEd like StrongSwan working.
Sponsored by: NVIDIA networking
Change 4787572d05 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().
No functional change intended.
Reviewed by: kp, imp, glebius, stevek
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D45740
In several places, a loop tests for powers of two, or iterates through
powers of two. In those places, replace the loop with an invocation
of fls or ilog2 without changing the meaning of the code.
Reviewed by: alc, markj, kib, np, erj, avg (previous version)
Differential Revision: https://reviews.freebsd.org/D45494
mlx4 and mlx5 are Ethernet devices and ether_ifattach() does an
unconditional bpfattach(). From commit 16d878cc99 [1] and on, we
should not check ifp->if_bpf to tell us whether or not we have any bpf
peers that might be interested in receiving packets. And since commit
2b9600b449 [2], ifp->if_bpf can not be NULL even after the network
interface has been detached.
No functional change intended.
1. 16d878cc99 Fix the following bpf(4) race condition which can result in a panic
2. 2b9600b449 Add dead_bpf_if structure, that should be used as fake bpf_if during ifnet detach
Reviewed by: kp, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D45196
These functions may map more memory for DMA than is actually used, since
the allocator operates on multiples of a 4KB page size. Thus,
bus_dmamap_sync() can trigger KMSAN reports when the unused portion of
a page is not zero-ed.
Reported by: KMSAN
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43133
Previously we were trying to set hca_cap_2 without checking if
sw_vhca_id_valid max value, which is the only settable value inside
hca_cap_2, and seeing that we dont have driver support for sw_vhca_id
yet there is no need to set hca_cap_2 at all, it is enough to query it.
Fixes: 7b959396ca ("mlx5: Introduce new destination type TABLE_TYPE")
MFC after: 3 days
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
If removing a node of type FS_TYPE_FLOW_DEST we lock the flow group too
late. This can lead to a deadlock with fs_add_dst_fg().
PR: 274715
MFC after: 1 week
Reviewed by: kib
Tested by: mm
Differential Revision: https://reviews.freebsd.org/D42368
This new destination type supports flow transition between different
table types, e.g. from NIC_RX to RDMA_RX or from RDMA_TX to NIC_TX.
In addition add driver support to be able to query the capability for
this new destination type.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
This allows to insert a rule and make sure it doesn't get
combined by the steering layer with any other rule.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
Exclude the last n entries for an autogrouped flow table.
Reserving entries at the end of the FT will ensure that this FG will be
the last to be evaluated. This will be used in the next patch to create
Linux upstream commit: 79cdb0aaea8b5478db34afa1d4d5ecc808689a67
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
Once all the large flow groups (defined by the user when the flow table
is created - max_num_groups) were created, then all the following new
flow groups will have only one flow table entry, even though the flow table
has place to larger groups.
Fix the condition to prefer large flow group.
Upstream Linux commit: 97fd8da281f80e7e69e0114bc906575734d4dfaf
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
Users of the steering APIs shouldn't use the PRM directly.
Create an software enum to be used instead.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
Create a struct to hold flow actions to be used when creating
a flow rule.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
The specification for the Toeplitz function doesn't require to set the key
explicitly to be symmetric. In case a symmetric functionality is required
a symmetric key can be simply used.
Wrongly forcing the algorithm to symmetric causes the wrong packet
distribution and a performance degradation.
Link: https://lore.kernel.org/r/20190723065733.4899-7-leon@kernel.org
Fixes: 28d6137008b2 ("IB/mlx5: Add RSS QP support")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Alex Vainman <alexv@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
[khng: cherry-picked from Linux
b7165bd0d6cbb93732559be6ea8774653b204480]
Sponsored by: Juniper Networks, Inc.
MFC after: 7 days
Reviewed by: kib, zlei
Differential Revision: https://reviews.freebsd.org/D42178
Failure to get mbufs may be transient.
Don't permanently fail to open the channels due to lack of mbufs.
This also makes modifying channel parameters faster.
MFC after: 1 week
Sponsored by: NVIDIA Networking
Implement one CQ modify function supporting all firmware versions,
instead of having more variants of CQ modify.
MFC after: 1 week
Sponsored by: NVIDIA Networking
When using hardware pacing, this value can be increased, because more SQ's
means more EQ events aswell. Make it tunable, hw.mlx5.comp_eq_size .
MFC after: 1 week
Sponsored by: NVIDIA Networking
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6b. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a. However in e87c494015 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.
This change not only reverts e87c494015 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch. A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.
Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH. But the tcp_lro_flush_all() asserts the presence
of network epoch. Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.
Reviewed by: zlei, gallatin, erj
Differential Revision: https://reviews.freebsd.org/D39510