Create a wrapper for newbus to take giant and for busses to take it too.
bus_topo_lock() should be called before interacting with newbus routines
and unlocked with bus_topo_unlock(). If you need the topology lock for
some reason, bus_topo_mtx() will provide that.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D31831
(cherry picked from commit c6df6f5322)
If the driver_version capability bit is enabled, send the driver
version to firmware after the init HCA command, for display purposes.
Example of driver version: "FreeBSD,mlx5_core,14.0.0,3.x-xxx"
Linux commits:
012e50e109fd27ff989492ad74c50ca7ab21e6a1
Sponsored by: NVIDIA Networking
(cherry picked from commit e6d7ac1d03)
Currently, unicast/multicast loopback raw ethernet (non-RDMA) packets
are sent back to the vport. A unicast loopback packet is the packet
with destination MAC address the same as the source MAC address. For
multicast, the destination MAC address is in the vport's multicast
filter list.
Moreover, the local loopback is not needed if there is one or none
user space context.
After this patch, the raw ethernet unicast and multicast local
loopback are disabled by default. When there is more than one user
space context, the local loopback is enabled.
Note that when local loopback is disabled, raw ethernet packets are
not looped back to the vport and are forwarded to the next routing
level (eswitch, or multihost switch, or out to the wire depending on
the configuration).
Linux commits:
c85023e153e3824661d07307138fdeff41f6d86a
8978cc921fc7fad3f4d6f91f1da01352aeeeff25
Sponsored by: NVIDIA Networking
(cherry picked from commit ea00d7e8ca)
This change adds convenience functions to setup a flow steering rule based on
a TCP socket. The helper function gets all the address information from the
socket and returns a steering rule, to be used with HW TLS RX offload.
Sponsored by: NVIDIA Networking
(cherry picked from commit 2c0ade806a)
This namespace will be used for TCP offloads, like hardware decryption
of TLS TCP data.
Sponsored by: NVIDIA Networking
(cherry picked from commit 0ee1b09eaa)
Add support to map an SQ to a specific schedule queue using a
special WQE as performance enhancement.
SQ remap operation is handled by a privileged internal queue, IQ,
and the mapping is enabled from one rate to another.
The transition from paced to non-paced should however always go
through FW.
Sponsored by: NVIDIA Networking
(cherry picked from commit 266c81aae3)
While at it only output driver version to dmesg(8) when hardware is present.
Differential Revision: https://reviews.freebsd.org/D29100
MFC after: 1 week
Reviewed by: kib and markj
Sponsored by: NVIDIA Networking
(cherry picked from commit d2cbfbc57b)
Properly allocate all mlx5en(4) structures from correct numa domain.
While at it cleanup unused numa domain integers deriving from the
Linux version of mlx5en(4).
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
(cherry picked from commit 7c3eff94bd)
Make sure the "uid" field gets properly set when destroying DCT and QP
objects by making a copy of the field when creating such objects.
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
(cherry picked from commit cbf6911e10)
RoCE is short for Remote direct memory access over Converged Ethernet.
ECN is short for Explicit Congestion Notification.
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
(cherry picked from commit 8abf5ac0e6)
Querying the PCI config space for offline for every firmware command blocks
the PCI bus and affects performance. Especially for packet pacing and TLS
when objects are frequently created and destroyed.
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
(cherry picked from commit e787b5acb1)
Make the mlx5e_mode_table[] array one dimensional, because there is only
one entry, 10G ER/LR, which share the same protocol bit.
This patch only adds support for basic sub-type distinguishing for the
extended protocol bits. Use verbose ifconfig eeprom output to get actual
media type.
Remove write only "connector_type" variable while at it.
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
(cherry picked from commit a888087fba)
-pci_get_class : This function search for a matching pci device based on
the class/subclass and returns a newly created pci_dev.
- pci_{save,restore}_state : This is analogous to ours with the same name
- pci_is_root_bus : Return true if this is the root bus
- pci_get_domain_bus_and_slot : This function search for a matching pci
device based on domain, bus and slot/function concat into a single
unsigned int (devfn) and returns a newly created pci_dev
- pci_bus_{read,write}_config* : Read/Write to the config space.
While here add some helper function to alloc and fill the pci_dev struct.
Reviewed by: hselasky, bz (older version)
Differential Revision: https://reviews.freebsd.org/D27550
This change include several changes as listed below all related to UAR.
UAR is a special PCI memory area where the so-called doorbell register and
blue flame register live. Blue flame is a feature for sending small packets
more efficiently via a PCI memory page, instead of using PCI DMA.
- All structures and functions named xxx_uuars were renamed into xxx_bfreg.
- Remove partially implemented Blueflame support from mlx5en(4) and mlx5ib.
- Implement blue flame register allocator.
- Use blue flame register allocator in mlx5ib.
- A common UAR page is now allocated by the core to support doorbell register
writes for all of mlx5en and mlx5ib, instead of allocating one UAR per
sendqueue.
- Add support for DEVX query UAR.
- Add support for 4K UAR for libmlx5.
Linux commits:
7c043e908a74ae0a935037cdd984d0cb89b2b970
2f5ff26478adaff5ed9b7ad4079d6a710b5f27e7
0b80c14f009758cefeed0edff4f9141957964211
30aa60b3bd12bd79b5324b7b595bd3446ab24b52
5fe9dec0d045437e48f112b8fa705197bd7bc3c0
0118717583cda6f4f36092853ad0345e8150b286
a6d51b68611e98f05042ada662aed5dbe3279c1e
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
- call pci_iov_detach() on detaching from PCI device to take care of hang
on destroying VFs after PF is down.
- disable eswitch SRIOV support right after pci_iov_detach(),
else the eswitch cleanup sometimes occur while the SRIOV flow table
is still present.
Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.
The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.
Linux commit:
e355477ed9e4f401e3931043df97325d38552d54
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Report EQE data upon CQ completion to let upper layers use this data.
Linux commit:
4e0e2ea1886afe8c001971ff767f6670312a9b04
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.
Linux commit:
38164b771947be9baf06e78ffdfb650f8f3e908e
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling drain DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.
Linux commit:
c5ae1954c47d3fd8815bd5a592aba18702c93f33
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Cleanup all host resources, SYSCTLs, MSIX vectors and memory used
by the host and only leave the device allocated memory behind, if any,
because it may still be in use, when the PCI remove function is called.
Else future probe calls may fail due to SYSCTLs already existing.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
PDDR (Port Diagnostics Database Register) is used to read the physical
layer debug database, which contains helpful troubleshooting information
regarding the state of the link.
PDDR register can only be queried when PCAM register reports it as
supported in its register mask. A new helper macro was added to
the MLX5_CAP_* infrastructure in order to access this mask.
Sponsored by: Mellanox Technologies - Nvidia
MFC after: 1 week
Currently the linking order of the infiniband, IB, modules decide in which
order the clients are attached and detached. For example one IB client may
use resources from another IB client. This can lead to a potential deadlock
at shutdown. For example if the ipoib is unregistered after the ib_multicast
client is detached, then if ipoib is using multicast addresses a deadlock may
happen, because ib_multicast will wait for all its resources to be freed before
returning from the remove method.
Fix this by using module_xxx_order() instead of module_xxx().
Differential Revision: https://reviews.freebsd.org/D23973
MFC after: 1 week
Sponsored by: Mellanox Technologies
Use fence instead of barrier, which is optimized to take advantage of
the x86 TSO memory model.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week