Commit graph

634 commits

Author SHA1 Message Date
Ariel Ehrenberg
2fb2c03512 mlx5_core: fix "no space" error on sriov enablement
Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define
is used to request the available maximum sized flow table, and zero doesn't
make sense for it, whereas many places in the driver use zero explicitly
expecting the smallest table size possible but instead due to this
define they end up allocating the biggest table size unawarely.

Sponsored by:	NVidia networking
2024-12-16 00:27:53 +02:00
Ariel Ehrenberg
29a9d7c6ce mlx5_core: fix panic on sriov enablement
Align the code of fdb steering with flow steering core
and add missing parts in namespace initialization and
in prio logic

PR:	281714
Sponsored by:	NVidia networking
2024-12-16 00:27:31 +02:00
Richard Scheffenegger
0fc7bdc978 tcp: extend the use of the th_flags accessor function
Formally, there are 12 bits for TCP header flags.
Use the accessor functions in more (kernel) places.

No functional change.

Reviewed By: cc, #transport, cy, glebius, #iflib, kbowling
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47063
2024-11-29 09:48:23 +01:00
Konstantin Belousov
4cc5d081d8 mlx5en: only enable to toggle offload caps if they are supported
Reviewed by:	Ariel Ehrenberg <aehrenberg@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2024-11-26 14:34:34 +02:00
Konstantin Belousov
cca0dc49e0 mlx5en: move runtime capabilities checks into helper functions
For TLS TX/RX, ratelimit, and IPSEC offload caps.

Reviewed by:	Ariel Ehrenberg <aehrenberg@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2024-11-26 14:34:34 +02:00
Gleb Smirnoff
67f9307907 mlx5e tls: use non-sleeping malloc flag is it was intended
Reviewed by:	gallatin
Fixes:		81b38bce07
2024-11-25 10:46:13 -08:00
Ariel Ehrenberg
253a1fa16b mlx5: Fix handling of port_module_event
Remove the array of port module status and instead save module status
and module number.

At boot, for each PCI function driver get event from fw about module
status. The event contains module number and module status. Driver
stores module number and module status..  When user (ifconfig) ask for
modules information, for each pci function driver first queries fw to
get module number of current pci function, then driver compares the
module number to the module number it stored before and if it matches
and module status is "plugged and enabled" then driver queries fw for
the eprom information of that module number and return it to the
caller.

In fact fw could have concluded that required module number of the
current pci function, but fw is not implemented this way. current
design of PRM/FW is that MCIA register handling is only aware of
modules, not the pci function->module connections.  FW is designed to
take the module number written to MCIA and write/read the content
to/from the associated module's EPROM.

So, based on current FW design, we must supply the module num so fw
can find the corresponding I2C interface of the module to write/read.

Sponsored by:	NVidia networking
MFC after:	1 week
2024-11-23 12:59:26 +02:00
Konstantin Belousov
0d38b0bc8f mlx5en: fix the sign of mlx5e_tls_st_init() error, convert from Linux to BSD
Sponsored by:	NVidia networking
MFC after:	1 week
2024-11-23 12:09:50 +02:00
Konstantin Belousov
64bf5a431c mlx5_en: style function prototype
Sponsored by:	NVidia networking
MFC after:	2 weeks
2024-11-23 12:01:50 +02:00
Andrew Gallatin
81b38bce07 mlx5e tls: Ensure all allocated tags have a hw context associated
Ensure all allocated tags have a hardware context associated.
The hardware context allocation is moved into the zone import
routine, as suggested by kib.  This is safe because these zone
allocations are always done in a sleepable context.

I have removed the now pointless num_resources tracking,
and added sysctls / tunables to control UMA zone limits
for these tls tags, as well as a tunable to let the
driver pre-allocate tags at boot.

MFC after:	2 weeks
2024-11-23 12:01:50 +02:00
Konstantin Belousov
de7a92756f mlx5en: improve reporting of kernel TLS, IPSEC offload, and ratelimit caps
Only ever set the capabilities bits if kernel options are enabled.
Check for hardware capabilities before setting software bits.

Sponsored by:	NVidia networking
MFC after:	1 week
2024-11-14 00:56:11 +02:00
Andrew Gallatin
49597c3e84 mlx5e: Use M_WAITOK when allocating TLS tags
Now that it is clear we're in a sleepable context, use
M_WAITOK when allocating TLS tags.

Suggested by: kib
Sponsored by: Netflix
2024-10-23 15:56:14 -04:00
Andrew Gallatin
81dbc22ce8 mlx5e: Immediately initialize TLS send tags
Under massive connection thrashing (web server restarting), we see
long periods where the web server blocks when enabling ktls offload
when NIC ktls offload is enabled.

It turns out the driver uses a single-threaded linux work queue to
serialize the commands that must be sent to the nic to allocate and
free tls resources. When freeing sessions, this work is handled
asynchronously. However, when allocating sessions, the work is handled
synchronously and the driver waits for the work to complete before
returning. When under massive connection thrashing, the work queue is
first filled by TLS sessions closing. Then when new sessions arrive,
the web server enables kTLS and blocks while the tens or hundreds of
thousands of sessions closes queued up are processed by the NIC.

Rather than using the work queue to open a TLS session on the NIC,
switch to doing the open directly. This allows use to cut in front of
all those sessions that are waiting to close, and minimize the amount
of time the web server blocks. The risk is that the NIC may be out of
resources because it has not processed all of those session frees. So
if we fail to open a session directly, we fall back to using the work
queue.

Differential Revision: https://reviews.freebsd.org/D47260
Sponsored by: Netflix
Reviewed by: kib
2024-10-23 15:16:19 -04:00
Konstantin Belousov
8e5b07dd08 mlx5_ipsec: add enough #ifdef IPSEC_OFFLOAD to make LINT_NOIP compilable
Reported by:	kp
Sponsored by:	NVidia networking
Fixes:	2851aafe96
2024-10-10 16:18:11 +03:00
Konstantin Belousov
2851aafe96 mlx5 ipsec_offload: ensure that driver does not dereference dead sahindex
Take the sahtree rlock and check for the DEAD SA state before validating
and filling the SA xfrm attributes.

Sponsored by:	NVidia networking
2024-10-10 12:55:45 +03:00
Doug Moore
5a5da24fc8 mlx5: optimize ilog2 calculation
Rather than compute ilog2(roundup_pow_of_two(x)), which invokes ilog2
twice, just use order_base_2 once.  And employ that optimization
twice.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D46838
2024-09-28 16:24:44 -05:00
Ariel Ehrenberg
3503aa0cdc mlx5en: Handle install SA for ipv6 encapsulated case
Pass the right encap flag to network card

Sponsored by:	NVidia networking
MFC after:	1 week
2024-09-22 19:06:02 +03:00
Konstantin Belousov
957e389ca7 dev/mlx5: remove some duplicated macros from device.h
Sponsored by:	NVidia networking
2024-09-04 11:49:38 +03:00
Ariel Ehrenberg
205263ac25 mlx5en: support ipsec offload on vlan if
Add vlan tag match to RX FS SA and policy rules
and report SA lifetime counter on vlan interface
in case SA was installed on vlan interface

Existing code didn't have the net tag id as part of
the FS matching rules. This can cause applying
ipsec offload to the wrong interface.
This commit add tag id as part of FS matchers
and treat tag value 0 as no tag

Sponsored by:   NVidia networking
2024-08-20 15:42:13 +03:00
Konstantin Belousov
828da10bb3 mlx5en: fix destroying tx sa_entry when installing rx sa_entry failed
In particular, do not cancel freed linux delayed work.

Sponsored by:	NVidia networking
2024-08-20 15:42:12 +03:00
Konstantin Belousov
d00f3505ef mlx5en: do now waste ipsec_accel_in_tag on non-ipsec packets rx
Do not prepend ipsec tags into mbuf head when preparing rx wqe, store it
separately.  Only prepend (and clear the store) when received packed was
indeed offloaded by ipsec engine.  Then we do not need to refill tags
for slots that received non-ipsec packets.

This should solve some minimal degradation of the rx CPU usage due to
unneeded tag allocation for each packet.

Sponsored by:	NVidia networking
2024-08-20 15:42:12 +03:00
Konstantin Belousov
2787f8c39c mlx5en: stop including mlx5_accel/ipsec.h from en.h
This creates a circular dependency preventing inline functions from
ipsec.h from using en.h definitions.

Sponsored by:	NVidia networking
2024-08-20 15:42:12 +03:00
Mark Johnston
27211b7998 mlx5: Remove a less than helpful debug print
Reviewed by:	khng
Fixes:	e23731db48 ("mlx5en: add IPSEC_OFFLOAD support")
Differential Revision:	https://reviews.freebsd.org/D46273
2024-08-12 23:06:01 +00:00
Andrew Gallatin
1f628be888 tcp_ratelimit: provide an api for drivers to release ratesets at detach
When the kernel is compiled with options RATELIMIT, the
mlx5en driver cannot detach. It gets stuck waiting for all
kernel users of its rates to drop to zero before finally calling
ether_ifdetach.

The tcp ratelimit code has an eventhandler for ifnet departure
which causes rates to be released. However, this is called as an
ifnet departure eventhandler, which is invoked as part of
ifdetach(), via either_ifdetach(). This means that the tcp
ratelimit code holds down many hw rates when the mlx5en driver
is waiting for the rate count to go to 0. Thus devctl detach
will deadlock on mlx5 with this stack:
mi_switch+0xcf sleepq_timedwait+0x2f _sleep+0x1a3 pause_sbt+0x77 mlx5e_destroy_ifp+0xaf mlx5_remove_device+0xa7 mlx5_unregister_device+0x78 mlx5_unload_one+0x10a remove_one+0x1e linux_pci_detach_device+0x36 linux_pci_detach+0x24 device_detach+0x180 devctl2_ioctl+0x3dc devfs_ioctl+0xbb vn_ioctl+0xca devfs_ioctl_f+0x1e kern_ioctl+0x1c3 sys_ioctl+0x10a

To fix this, provide an explicit API for a driver to call the tcp
ratelimit code telling it to detach itself from an ifnet. This
allows the mlx5 driver to unload cleanly. I considered adding an
ifnet pre-departure eventhandler. However, that would need to be
invoked by the driver, so a simple function call seemed better.

The mlx5en driver has been updated to call this function.

Reviewed by: kib, rrs

Differential Revision:	https://reviews.freebsd.org/D46221
Sponsored by: Netflix
2024-08-05 12:51:35 -04:00
Konstantin Belousov
2204a48290 mlx5en: limit reporting eeprom read failure due to unplugged module to verboseboot
Requested by:	gallatin
Sponsored by:	NVIDIA networking
MFC after:	1 week
2024-07-30 18:00:04 +03:00
Konstantin Belousov
e23731db48 mlx5en: add IPSEC_OFFLOAD support
Right now, only IPv4 transport mode, with aes-gcm ESP, is supported.
Driver also cooperates with NAT-T, and obeys socket policies, which
makes IKEd like StrongSwan working.

Sponsored by:	NVIDIA networking
2024-07-30 18:00:04 +03:00
Zhenlei Huang
aa3860851b net: Remove unneeded NULL check for the allocated ifnet
Change 4787572d05 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().

No functional change intended.

Reviewed by:	kp, imp, glebius, stevek
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D45740
2024-06-28 18:16:29 +08:00
Doug Moore
dc048255b3 mlx5: use roundup_pow_of_two
Use roundup_pow_of_two in place of an expression.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D45536
2024-06-24 02:22:52 -05:00
Kristof Provost
7deadea2eb mlx5: handle vlan PF restrictions
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D45427
2024-06-17 10:50:16 +02:00
Doug Moore
f0a0420dfd powerof2: replace loops with fls or ilog2
In several places, a loop tests for powers of two, or iterates through
powers of two.  In those places, replace the loop with an invocation
of fls or ilog2 without changing the meaning of the code.

Reviewed by:	alc, markj, kib, np, erj, avg (previous version)
Differential Revision:	https://reviews.freebsd.org/D45494
2024-06-12 05:00:48 -05:00
Zhenlei Huang
2439ae9483 mlx4, mlx5: Eliminate redundent NULL check for packet filter
mlx4 and mlx5 are Ethernet devices and ether_ifattach() does an
unconditional bpfattach(). From commit 16d878cc99 [1] and on, we
should not check ifp->if_bpf to tell us whether or not we have any bpf
peers that might be interested in receiving packets. And since commit
2b9600b449 [2], ifp->if_bpf can not be NULL even after the network
interface has been detached.

No functional change intended.

1. 16d878cc99 Fix the following bpf(4) race condition which can result in a panic
2. 2b9600b449 Add dead_bpf_if structure, that should be used as fake bpf_if during ifnet detach

Reviewed by:	kp, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45196
2024-05-28 12:46:04 +08:00
Konstantin Belousov
c097967b9a mlx5en: add diagnostic in one more case of failed eeprom read preparation
Sponsored by:	Nvidia Networking
MFC after:	1 week
2024-05-06 06:15:35 +03:00
Mark Johnston
47a6fb9d5a mlx5: Zero DMA memory mlx5_alloc_cmd_msg() and alloc_cmd_page()
These functions may map more memory for DMA than is actually used, since
the allocator operates on multiples of a 4KB page size.  Thus,
bus_dmamap_sync() can trigger KMSAN reports when the unused portion of
a page is not zero-ed.

Reported by:	KMSAN
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D43133
2024-01-18 16:52:17 -05:00
Konstantin Belousov
987446fa39 mlx5(4): only detach IOV children if iov was successfully initialized
Reported by:	jwd
Sponsored by:	NVidia networking
MFC after:	1 week
2024-01-05 06:52:39 +02:00
Patrisious Haddad
0cd90ee598 mlx5: Fix HCA cap 2 query
Previously we were trying to set hca_cap_2 without checking if
sw_vhca_id_valid max value, which is the only settable value inside
hca_cap_2, and seeing that we dont have driver support for sw_vhca_id
yet there is no need to set hca_cap_2 at all, it is enough to query it.

Fixes: 7b959396ca ("mlx5: Introduce new destination type TABLE_TYPE")
MFC after:	3 days
2023-12-03 10:21:44 +02:00
Warner Losh
fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Martin Matuska
a592812327 mlx5_core: fix deadlock when using RXTLS
If removing a node of type FS_TYPE_FLOW_DEST we lock the flow group too
late. This can lead to a deadlock with fs_add_dst_fg().

PR:		274715
MFC after:	1 week
Reviewed by:	kib
Tested by:	mm
Differential Revision: https://reviews.freebsd.org/D42368
2023-11-16 12:17:41 +01:00
Konstantin Belousov
97beac79ed mlx5core: add linux/bitops.h include for mlx5_ifc.h use of BIT()
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:17 +02:00
Patrisious Haddad
7b959396ca mlx5: Introduce new destination type TABLE_TYPE
This new destination type supports flow transition between different
table types, e.g. from NIC_RX to RDMA_RX or from RDMA_TX to NIC_TX.

In addition add driver support to be able to query the capability for
this new destination type.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:17 +02:00
Konstantin Belousov
b94ef2a3bc mlx5ib: adjust for the mlx5_create_auto_grouped_flow_table() interface change
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:17 +02:00
Mark Bloch
ad74454131 mlx5: add ability to attach flow counter to steering rule
Expose a way to attach a counter to a flow rule.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:17 +02:00
Raed Salem
35bbcf0916 mlx5: add fs_counters
Signed-off-by: Raed Salem <raeds@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:17 +02:00
Mark Bloch
6a6af22b6e mlx5: Add a no-append flow insertion mode
This allows to insert a rule and make sure it doesn't get
combined by the steering layer with any other rule.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:16 +02:00
Mark Bloch
0a5db6bb3a net/mlx5: Allow creating autogroups with reserved entries
Exclude the last n entries for an autogrouped flow table.

Reserving entries at the end of the FT will ensure that this FG will be
the last to be evaluated. This will be used in the next patch to create

Linux upstream commit: 79cdb0aaea8b5478db34afa1d4d5ecc808689a67
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:16 +02:00
Mark Bloch
04db54fe43 net/mlx5: Fix auto group size calculation
Once all the large flow groups (defined by the user when the flow table
is created - max_num_groups) were created, then all the following new
flow groups will have only one flow table entry, even though the flow table
has place to larger groups.
Fix the condition to prefer large flow group.

Upstream Linux commit: 97fd8da281f80e7e69e0114bc906575734d4dfaf
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:16 +02:00
Mark Bloch
76ed99ed8a mlx5: Use software enum in APIs instead of PRM
Users of the steering APIs shouldn't use the PRM directly.
Create an software enum to be used instead.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:16 +02:00
Mark Bloch
45e2e55df6 mlx5: Add packet reformat support to flow rules
Allow attaching a packet reformat action to a flow rule.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:16 +02:00
Mark Bloch
847542c60c mlx5: Add modify header support to flow rules
Allow attaching a modify header to a flow rule.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:16 +02:00
Mark Bloch
cb054a493a mlx5: Refactor flow actions into a struct
Create a struct to hold flow actions to be used when creating
a flow rule.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:16 +02:00
Mark Bloch
bb4645b95b mlx5: Add packet reformat allocation support
Add support to allocating a packet reformat context.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Sponsored by:	NVidia networking
MFC after:	1 week
2023-11-16 01:08:16 +02:00