These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 9feff969a0)
If the driver_version capability bit is enabled, send the driver
version to firmware after the init HCA command, for display purposes.
Example of driver version: "FreeBSD,mlx5_core,14.0.0,3.x-xxx"
Linux commits:
012e50e109fd27ff989492ad74c50ca7ab21e6a1
Sponsored by: NVIDIA Networking
(cherry picked from commit e6d7ac1d03)
Currently, unicast/multicast loopback raw ethernet (non-RDMA) packets
are sent back to the vport. A unicast loopback packet is the packet
with destination MAC address the same as the source MAC address. For
multicast, the destination MAC address is in the vport's multicast
filter list.
Moreover, the local loopback is not needed if there is one or none
user space context.
After this patch, the raw ethernet unicast and multicast local
loopback are disabled by default. When there is more than one user
space context, the local loopback is enabled.
Note that when local loopback is disabled, raw ethernet packets are
not looped back to the vport and are forwarded to the next routing
level (eswitch, or multihost switch, or out to the wire depending on
the configuration).
Linux commits:
c85023e153e3824661d07307138fdeff41f6d86a
8978cc921fc7fad3f4d6f91f1da01352aeeeff25
Sponsored by: NVIDIA Networking
(cherry picked from commit ea00d7e8ca)
This change adds convenience functions to setup a flow steering rule based on
a TCP socket. The helper function gets all the address information from the
socket and returns a steering rule, to be used with HW TLS RX offload.
Sponsored by: NVIDIA Networking
(cherry picked from commit 2c0ade806a)
This namespace will be used for TCP offloads, like hardware decryption
of TLS TCP data.
Sponsored by: NVIDIA Networking
(cherry picked from commit 0ee1b09eaa)
Previously flow steering tables and rules were only created and destroyed
at link up and down events, respectivly. Due to new requirements for adding
TLS RX flow tables and rules, the main flow steering table must always be
available as there are permanent redirections from the TLS RX flow table
to the vlan flow table.
Sponsored by: NVIDIA Networking
(cherry picked from commit e059c120b4)
Add a refcount for posted WQEs to avoid a race between
post WQE and FW command flows.
Sponsored by: NVIDIA Networking
(cherry picked from commit a8e715d21b)
All packets must go through the indirection table, RQT,
because it is not possible to modify the RQN of the TIR
for direct dispatchment after it is created, typically
when the link goes up and down.
Sponsored by: NVIDIA Networking
(cherry picked from commit 06c2bd1872)
Add support to map an SQ to a specific schedule queue using a
special WQE as performance enhancement.
SQ remap operation is handled by a privileged internal queue, IQ,
and the mapping is enabled from one rate to another.
The transition from paced to non-paced should however always go
through FW.
Sponsored by: NVIDIA Networking
(cherry picked from commit 266c81aae3)
Internal send queues are regular sendqueues which are reserved for WQE commands
towards the hardware and firmware. These queues typically carry resync
information for ongoing TLS RX connections and when changing schedule queues
for rate limited connections.
The internal queue, IQ, code is more or less a stripped down copy
of the existing SQ managing code with exception of:
1) An optional single segment memory buffer which can be read or
written as a whole by the hardware, may be provided.
2) An optional completion callback for all transmit operations, may
be provided.
3) Does not support mbufs.
Sponsored by: NVIDIA Networking
(cherry picked from commit 694263572f)
The TLS RX support also needs to be able to allocate DEK objects.
Share the available objects 1:1.
Sponsored by: NVIDIA Networking
(cherry picked from commit 75767cb889)
Allocate the RQT once, pointing all initial entries to the drop RQN.
When opening the channels simplify modify the RQT, directing all traffic
to the new RQNs. Similarly when closing the channels point all RQT entries
back to the so-called drop RQN.
Sponsored by: NVIDIA Networking
(cherry picked from commit 33a6a7a72a)
What is a drop RQ and why is it needed?
The RSS indirection table, also called the RQT, selects the
destination RQ based on the receive queue number, RQN. The RQT is
frequently referred to by flow steering rules to distribute traffic
among multiple RQs. The problem is that the RQs cannot be destroyed
before the RQT referring them is destroyed too. Further, TLS RX
rules may still be referring to the RQT even if the link went
down. Because there is no magic RQN for dropping packets, we create
a dummy RQ, also called drop RQ, which sole purpose is to drop all
received packets. When the link goes down this RQN is filled in all
RQT entries, of the main RQT, so the real RQs which are about to be
destroyed can be released and the TLS RX rules can be sustained.
Sponsored by: NVIDIA Networking
(cherry picked from commit 27b778ae55)
During packet reception the network stack frequently transmit data in
response to TCP window updates. To reduce the number of transmit doorbells
needed, inhibit all transmit doorbells designated for the same channel until
after the reception of packets for the given channel is completed.
While at it slightly refactor the mlx5e_tx_notify_hw() function:
1) The doorbell information is always stored into sq->doorbell.d64 .
No need to pass a separate pointer to this variable.
2) Move checks for skipping doorbell writes inside this function.
Sponsored by: NVIDIA Networking
(cherry picked from commit 2d5e5a0d75)
In my understanding this is only needed to workaround lost interrupts.
I was thinking to remove it completely, but the comment about edge-
triggered interrupt may be true and needs deeper investigation. ~1Hz
should be often enough to handle the supposedly rare loss cases, but
rare enough to not appear in top. Add sysctl hw.atkbd.hz to tune it.
MFC after: 1 month
(cherry picked from commit 9e007a88d6)
Flip dwwdt_prevent_restart to false. What's the use of a watchdog if it
does not restart a hung system?
Add a knob for panic-ing on the first timeout, resetting on the second
one. This can be useful if interrupts can still work, otherwise a reset
recovers a system without any aid for debugging the hang.
The change also doubles the timeout that's programmed into the hardware.
The previous version of the code always had the interrupt on the first
timeout enabled, but it took no action on it. Only the second timeout
could be configured to reset the system. So, the hardware timeout was
set to a half of the user requested timeout. But now,we can take a
corrective action on the first timeout, so we use the user requested
timeout.
While here, define boolean sysctl-s as such.
(cherry picked from commit ee900888c4)
Match igb(4) as in f7926a6d0c. From Vincenzo, this check is redundant
to setup providing us an IGC_RXD_STAT_VP bit and would make for an
unexpected condition if IFCAP_VLAN_HWTAGGING were not set but the tag
was stripped, which would be passed up the stack breaking isolation.
PR: 260068
Approved by: vmaffione
(cherry picked from commit b4a58b3d58)
vt_fini_logos() calls vtbuf_grow(), which reallocates the console
window's buffer using malloc(M_WAITOK). Because vt_fini_logos() is
called via a callout, we end up panicking if INVARIANTS is enabled.
Fix the problem simply by clearing the logos using a timed taskqueue.
taskqueue_thread is formally allowed to sleep; of course, if we actually
end up sleeping to satisfy the allocation, then we have bigger problems.
PR: 260896
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 6c7e4d72b1)
The logic that sets iri_vtag and M_VLANTAG does not handle the
case where the 802.11q VLAN tag is 0. Fix this issue across
the iflib drivers. While there, also improve and align the
VLAN tag check extraction, by moving it outside the RX descriptor
loop, eliminating a local variable and additional checks.
PR: 260068
Reviewed by: kbowling, gallatin
Reported by: erj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33156
(cherry picked from commit f7926a6d0c)
Since isc_capenable (private copy of ifp->if_capenable) is
now synchronized to if_capenable, use it in the drivers
when checking the IFCAP_* bits.
This results in better cache usage and avoids indirection
through the ifp pointer.
PR: 260068
Reviewed by: kbowling, gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33156
(cherry picked from commit 52f45d8ace)
sddadump has been derived from sddastart.
mmc_sim interface has grown a new method, cam_poll, to support polled
operation.
mmc_sim code has been changed to provide a sim_poll hook only if the
controller implements the new method. The hooks is implemented in terms
of the new mmc_sim_cam_poll method.
Additionally, in-progress CCB-s now have CAM_REQ_INPROG status to
satisfy xpt_pollwait().
mmc_sim_cam_poll method has been implemented in dwmmc host controller.
Relnotes: perhaps
(cherry picked from commit 44682688f0)
If a counter more than overflows just as we read it on switch out then,
if using sampling mode, we will negate this small value to give a huge
reload count, and if we later switch back in that context we will
validate that value against pm_reloadcount and panic an INVARIANTS
kernel with:
panic: [pmc,1470] pmcval outside of expected range cpu=2 ri=16 pmcval=fffff292 pm_reloadcount=10000
or similar. Presumably in a non-INVARIANTS kernel we will instead just
use the provided value as the reload count, which would lead to the
overflow not happing for a very long time (e.g. 78 minutes for a 48-bit
counter incrementing at an averate rate of 1GHz).
Instead, clamp the reload count to 0 (which corresponds precisely to the
value we would compute if it had just overflowed and no more), which
will result in hwpmc using the full original reload count again. This is
the approach used by core for Intel (for both fixed and programmable
counters).
As part of this, armv7 and arm64 are made conceptually simpler; rather
than skipping modifying the overflow count for sampling mode counters so
it's always kept as ~0, those special cases are removed so it's always
applicable and the concatentation of it and the hardware counter can
always be viewed as a 64-bit counter, which also makes them look more
like other architectures.
Whilst here, fix an instance of UB (shifting a 1 into the sign bit) for
amd in its sign-extension code.
Reviewed by: andrew, mhorne, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33654
(cherry picked from commit e74c7ffcb1)