Recent firmwares are able to utilize the traffic classes of tx channels
that were previously unused. This effectively doubles the number of
traffic classes available per port for 2 port cards. Stop using the raw
per-channel value in the driver and ask the firmware for the number of
usable traffic classes instead.
Sponsored by: Chelsio Communications
(cherry picked from commit 6beb67c7e0)
Use the correct SGL limit within iw_cxgbe, firmwares >= 1.25.6.0 support
upto 512 entries per MR.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
(cherry picked from commit 211972cfb8)
Changes since 1.25.0.0 are listed here. This list comes from the
Release Notes for the "Chelsio Unified Wire v3.14.0.3 for Linux"
release dated 2021-05-21.
Fixes
-----
BASE:
- Fixed Back to back T6 100G-CR4 link coming up with NO FEC sometimes.
- [T5] Try to bring up link in 1G speed if link doesn't come up on 10G.
- Fixed a bug to not allow BaseR fec in 100G speed.
- Fixed linkup issues on BT adapter in 1G and 100M speed.
- Fixed an issue to allow driver to send VI_ENABLE multiple times (once
with rx disable and then later rx enable).
- Fixed rate limiting not working on class number 16 to 30.
- Fixed backward compatibility issue in port type interpretation with vpd
version 0x80.
ETH:
- Fixed a case when firmware failed to deliver NIC WR completion to host.
- No rate limit support for WR ETH_TX_PKTS2 due to performance reasons.
OFLD
- Fixed a connection hang in SO adapters when tp_plen_max (set by driver)
is more than the window size.
- Added fw_filter_vnic_mode to firmware API file (t4fw_interface.h)
- Use correct rx channel in coprocessor crypto completion (CPL_FW6_PLD). This
was causing out of order completion to host.
FOiSCSI
- Fixed a crash due to unaligned access of ipv6 address.
- Fixed a crash during lun reset.
Enhancements
------------
ETH:
- Rate limiting support added for encapsulated (vxlan, nvgre, geneve) NIC TCP
packets.
OFLD:
- More than 128 SGLs supported in FW_RI_FR_NSMR_WR. Now, more than 16GB
(upto 64GB) of PBLs can be written with single FW_RI_FR_NSMR_WR.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
(cherry picked from commit e0fa04e257)
CTRL and OFLD tx queues do not have automatic tx credit flush enabled so
it is okay for the cidx not to be the same as the pidx when the queue is
destroyed.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
(cherry picked from commit 5ef87bf8b6)
- Process the list of local IPs once instead of once per adapter. Add
addresses from all VNETs to the driver's list but leave hardware
updates for later when the global VNET/IFADDR list locks have been
released.
- Add address to the hardware table synchronously when a CLIP entry is
requested for an address that's not already in there.
- Provide ioctls that allow userspace tools to manage addresses in the
CLIP table.
- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
automatically added to the CLIP table or not.
Sponsored by: Chelsio Communications
(cherry picked from commit 24b98f288d)
Add suspend/resume callbacks to the driver and a live reset built around
them. This commit covers the basic NIC and future commits will expand
this functionality to other stateful parts of the chip. Suspend and
resume operate on the chip (the t?nex nexus device) and affect all its
ports. It is not possible to suspend/resume or reset individual ports.
All these operations can be performed on a running NIC. A reset will
look like a link bounce to the networking stack.
Here are some ways to exercise this functionality:
/* Manual suspend and resume. */
# devctl suspend t6nex0
# devctl resume t6nex0
/* Manual reset. */
# devctl reset t6nex0
/* Manual reset with driver sysctl. */
# sysctl dev.t6nex.0.reset=1
/* Automatic adapter reset on any fatal error. */
# hw.cxgbe.reset_on_fatal_err=1
Suspend disables the adapter (DMA, interrupts, and the port PHYs) and
marks the hardware as unavailable to the driver. All ifnets associated
with the adapter are still visible to the kernel but operations that
require hardware interaction will fail with ENXIO. All ifnets report
link-down while the adapter is suspended.
Resume will reattach to the card, reconfigure it as before, and recreate
the queues servicing the existing ifnets. The ifnets are able to send
and receive traffic as soon as the link comes back up.
Reset is roughly the same as a suspend and a resume with at least one of
these events in between: D0->D3Hot->D0, FLR, PCIe link retrain.
(cherry picked from commit 83b5cda106)
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC. This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.
This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):
/* Idempotent */
alloc_foo
{
if (!SW_ALLOCATED)
init_iq/init_eq/init_fl no-fail sw init
alloc_iq_fl/alloc_eq/alloc_wrq may-fail sw alloc
add_foo_sysctls, etc. no-fail post-alloc items
if (!HW_ALLOCATED)
alloc_iq_fl_hwq/alloc_eq_hwq hw resource allocation
}
/* Idempotent */
free_foo
{
if (!HW_ALLOCATED)
free_iq_fl_hwq/free_eq_hwq release hw resources
if (!SW_ALLOCATED)
free_iq_fl/free_eq/free_wrq release sw resources
}
The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent. The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.
Sponsored by: Chelsio Communications
(cherry picked from commit 43bbae1948)
For SCSI data underrun is a part of normal life. It should not be
reported as error. This fixes MODE SENSE used by modern CAM.
MFC after: 1 month
(cherry picked from commit e8144a13e0)
Summary:
An error mapping PCI resources results in a panic due to unallocated
resources being freed up. This change puts the appropriate checks in
place to prevent the panic.
PR: 252445
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
Tested by: marcus
MFC after: 1 week
Sponsored by: VMware
Test Plan:
Along with user testing, also simulated error by inserting a ENXIO
return in vmci_map_bars().
Reviewed by: marcus
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D32016
(cherry picked from commit 0f14bcbe38)
Looking for "tsc-tsc" in the pmu tables will fail every time. Instead,
make this an alias for the static TSC event defined in pmc_events.h.
This fixes 'pmcstat -s cycles' on Intel and AMD.
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32197
(cherry picked from commit 937539e0a3)
It fixes lock ordere reversal between SIM and device locks. Also
remove registration for AC_FOUND_DEVICE, unused for a while now.
MFC after: 1 month
(cherry picked from commit 02d8194012)
This is useful for diagnosing problems. In particular, the errata
sheets identify the EEPROM version for many fixes.
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32333
(cherry picked from commit 293663f4da)
Otherwise results in KASSERT with debug kernels because we rely on the
iflib CTX lock to implement the software serialization to the NVM model
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32333
(cherry picked from commit 9b3e252e59)
This is the same underlying problem as 2624598064, just for bus ranges
rather than windows. SiFive's HiFive Unmatched has the following
topology:
Root Port <---> Bridge <---> Bridge <-+-> Bridge <---> (Unused)
(pcib0) (pcib1) (pcib2) | (pcib3)
+-> Bridge <---> xHCI
| (pcib4)
+-> Bridge <---> M.2 E-key
| (pcib5)
+-> Bridge <---> M.2 M-key
| (pcib6)
+-> Bridge <---> x16 slot
(pcib7)
If a device is plugged into the x16 slot that itself has a bridge, such
as many graphics cards, we currently fail to allocate a bus number for
its child bus (and so pcib_attach_child skips adding a child bus for
further enumeration) as, when the new child bridge attaches, it attempts
to allocate a bus number from its parent (pcib7) which in turn attempts
to grow its own bus range by calling bus_adjust_resource on its own
parent (pcib2) whose bus rman cannot accommodate the request and needs
to itself be extended by calling its own parent (pcib1). Note that
pcib3-7 do not face the same issue when they attach since pcib1 ends up
managing bus numbers 1-255 from the beginning and so never needs to grow
its own range.
Reviewed by: jhb, mav
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32011
(cherry picked from commit 31776afdc7)
Drop arguments of function prototypes since the file is mixed between
listing arg names and not.
No functional changes
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D32329
(cherry picked from commit 28ccd780a9)
ena: Remove redundant declaration of ena_log_level.
GCC6 raises a -Wredundant-decl error due to duplicate declarations
in ena_fbsd_log.h and ena_plat.h.
Sponsored by: Chelsio Communications
(cherry picked from commit 8843787aa1)
ena: Avoid unnecessary mbuf collapses for LLQ condition
In case of Low-latency Queue, one small enough descriptor can be pushed
directly to the ENA hw, thus saving one fragment. Check for this
condition before performing collapse.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit c81f8c2611)
ena: Trigger reset on ena_com_prepare_tx failure
All ena_com_prepare_tx errors other than ENA_COM_NO_MEM are fatal and
require device reset.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 36130d2979)
ena: Prevent reset after device destruction
Check for ENA_FLAG_TRIGGER_RESET inside a locked context in order to
avoid potential race conditions with ena_destroy_device. This aligns the
reset task logic with the Linux driver.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 433ab9b698)
ena: Add extra log messages
Stay aligned with the Linux driver by adding the following logs:
* inform the user about retrying queue creation
* warn on non-empty ena_tx_buffer.mbuf prior to ena_tx_map_mbuf
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 77160654a1)
ena: Add locking assertions
ENA silently assumed that ena_up, ena_down and ena_start_xmit routines
should be called within locked context. Driver's logic heavily assumes
on concurrent access to those routines, so for safety and better
documentation about this assumption, the locking assertions were added
to the above functions.
The assertion was added only for the main steps (skipping the helper
functions) which can be called from multiple places including the kernel
and the driver itself.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit cb98c439d6)
ena: Move RSS logic into its own source files
Delegate RSS related functionality into separate .c/.h files in
preparation for the full RSS support.
While at it, reorder functions and remove prototypes for ones with
internal linkage.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 986e7b9227)
ena: Disable meta descriptor caching for netmap
If LLQ is being used, `ena_tx_ctx.meta_valid` must stay enabled. This
fixes netmap support on latest generation ENA HW and aligns it with the
core driver behavior.
As netmap doesn't support any csum offloads, the
`adapter->disable_meta_caching` value can be simply passed to the HW.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit a831466830)
ena: Share ena_global_lock between driver instances
In order to use `ena_global_lock` in sysctl context, it must be kept
outside the driver instance's software context, as sysctls can be called
before attach and after detach, leading to lock use before sx_init and
after sx_destroy otherwise.
Solve this issue by turning `ena_global_lock` into a file scope
variable, shared between all instances of the driver and associated
sysctl context, and in turn initialized/destroyed in dedicated
SYSINIT/SYSUNINIT functions.
As a side effect, this change also fixes existing race in the reset
routine, when simultaneously accessing sysctl exposed properties.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 07aff471c0)
ena: Add missing statistics
Provide the following sysctl statistics in order to stay aligned with
the Linux driver:
* rx_ring.csum_good
* tx_ring.unmask_interrupt_num
Also rename the 'bad_csum' statistic name to 'csum_bad' for alignment.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 223c8cb12e)
ena: Implement full RSS reconfiguration
Bind RX/TX queues and MSI-X vectors to matching CPUs based on the RSS
bucket entries.
Introduce sysctls for the following RSS functionality:
- rss.indir_table: indirection table mapping
- rss.indir_table_size: indirection table size
- rss.key: RSS hash key (if Toeplitz used)
Said sysctls are only available when compiled without `option RSS`, as
kernel-side RSS support currently doesn't offer RSS reconfiguration.
Migrate the hash algorithm from CRC32 to Toeplitz and change the initial
hash value to 0x0 in order to match the standard Toeplitz implementation.
Provide helpers for hash key inversion required for HW operations.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 6d1ef2abd3)
ena: fix building in-kernel driver
When building ENA as compiled into the kernel, the driver would fail to
build. Resolve the problem by introducing the following changes:
1. Add missing `ena_rss.c` entry in `sys/conf/files`.
2. Prevent SYSCTL_ADD_INT from throwing an assert due to an extra
CTLTYPE_INT flag.
Fixes: 986e7b9227 ("ena: Move RSS logic into its own source files")
Fixes: 6d1ef2abd3 ("ena: Implement full RSS reconfiguration")
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week
(cherry picked from commit a3f0d18237)
ena: Update driver version to v2.4.1
Some of the changes in this release:
* Hardware RSS hash key reconfiguration and indirection table
reconfiguration support.
* Full kernel RSS support.
* Extra statistic counters.
* Netmap support for ENAv3.
* Locking assertions.
* Extra log messages.
* Reset handling fixes.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 42c7760be3)
ena: change ENA C++-style comment into C-style
According to man style(9), only C-style comments should be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 438c9e3cf8)
ena: add support for the large LLQ headers in ENA
Default LLQ (Low-latency queue) maximum header size is 96 bytes and can
be too small for some types of packets - like IPv6 packets with multiple
extension. This can be fixed, by using large LLQ headers.
If the device supports larger LLQ headers, the user can activate this
feature by setting sysctl tunable 'hw.ena.force_large_llq_header' to '1'
in the /boot/loader.conf file.
In case the device isn't supporting this feature, the default value (96B)
will be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit beaadec9ea)
ena: remove surplus NULL checks when freeing ENA resources
Calling free on a NULL pointer is valid, as appropriate check is already
done internally:
/* free(NULL, ...) does nothing */
if (addr == NULL)
return;
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit ddec69e6a7)
ena: hide sysctl nodes for unused ENA queues
IO queue related attributes are registered statically at driver attach
with the rest of the ENA specific sysctl nodes. However, the number of
queues can be changed at runtime via the `ena_sysctl_io_queues_nb`
request, leading to a potential exposure of attributes for non-existing
queues.
Introduce a new `ena_sysctl_update_queue_node_nb` function, which
updates the sysctl nodes after the number of queues is altered.
This happens by either registering or unregistering node specific oids,
based on a delta between the previous and current queue count.
NOTE: All unregistered oids must be registered again before the driver
detach, e.g. by another call to this function.
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 0e7d31f63b)
Merge tag 'vendor/ena-com/2.4.0'
Update the driver in order not to break its compilation
and make use of the new ENA logging system
Migrate platform code to the new logging system provided by ena_com
layer.
Make ENA_INFO the new default log level.
Remove all explicit use of `device_printf`, all new logs requiring one
of the log macros to be used.
(cherry picked from commit 3fc5d816f8)
Update ENA driver man page
Bring the obsolete man page up to date:
* update diagnostic error messages
* add documentation of loader tunables
* document netmap support
* add a driver history section
* update the contact information
Submitted by: Artur Rojek <ar@semihalf.com>
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit e34856a2c4)
Update ENA version to v2.4.0
Some of the changes in this release:
* Large LLQ headers,
* Bug/stability fixes,
* Change of the README/Documentation.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 93f0df457b)
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a
socket rather than a socket buffer. Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.
When locking for I/O, return an error if the socket is a listening
socket. Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.
Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In
particular, add checks to sendfile() and sorflush().
Reviewed by: tuexen, gallatin
Sponsored by: The FreeBSD Foundation
(cherry picked from commit f94acf52a4)
When a usb device is detached, usb_pc_dmamap_destroy() called
bus_dmamap_destroy() while the map was still loaded. That's harmless on x86
architectures, but on all other platforms it causes bus_dmamap_destroy() to
return EBUSY and leak away any memory resources (including bounce buffers)
associated with the mapping, as well as any allocated map structure itself.
This change introduces a new is_loaded flag to the usb_page_cache struct to
track whether a map is loaded or not. If the map is loaded,
bus_dmamap_unload() is called before bus_dmamap_destroy() to avoid leaking
away resources.
Differential Revision: https://reviews.freebsd.org/D32208
(cherry picked from commit dc91a9715f)
Previously, we were collecting at a base rate of:
64 bits x 32 pools x 10 Hz = 2.5 kB/s
This change drops it to closer to 64-ish bits per pool per second, to
work a little better with entropy providers in virtualized environments
without compromising the security goals of Fortuna.
(cherry picked from commit 5e79bba562)
Refer to discussion in PR 230808 for a less incomplete discussion, but
the gist of this change is that we currently collect orders of magnitude
more entropy than we need.
The excess comes from bytes being read out of /dev/*random. The default
rate at which we collect entropy without the read_rate increase is
already more than we need to recover from a compromise of an internal
state.
For stable/13, the read_rate_increment symbol remains as a stub to avoid
breaking loadable random modules.
(cherry picked from commit 6895cade94)
This is a combination of 7 commits.
mgb: update Microchip URLs
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 6b25b4a73f)
mgb: enable multicast in mgb_init
Receive Filtering Engine (RFE) configuration is not yet implemented,
and mgb intended to enable all broadcast, multicast, and unicast.
However, MGB_RFE_ALLOW_MULTICAST was missed (MGB_RFE_ALLOW_UNICAST was
included twice).
MFC after: 1 week
Fixes: 8890ab7758 ("Introduce if_mgb driver...")
Sponsored by: The FreeBSD Foundation
(cherry picked from commit ecac5c2928)
mgb: Do not KASSERT on error in mgb_init
There's not much we can do if mii_mediachg() fails, but KASSERT is not
appropriate.
MFC after: 1 week
Fixes: 8890ab7758 ("Introduce if_mgb driver...")
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 8b889b8953)
mgb: Staticize devclass and iflib structs (as is typical)
MFC after: 1 week
Fixes: 8890ab7758 ("Introduce if_mgb driver...")
Sponsored by: The FreeBSD Foundation
(cherry picked from commit c83ae596f3)
mgb: Apply some style(9)
Add parens around return values, rewrap lines
MFC after: 1 week
Fixes: 8890ab7758 ("Introduce if_mgb driver...")
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 820da5820e)
mgb: Fix DEBUG (and LINT) build
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 5f07d7fe40)
mgb: Fix nop admin interrupt handling
Previously mgb_admin_intr printed a diagnostic message if no interrupt
status bits were set, but it's not valid to call device_printf() from a
filter. Just drop the message as it has no user-facing value.
Also return FILTER_STRAY in this case - there is nothing further for
the driver to do.
Reviewed by: kbowling
MFC after: 1 week
Fixes: 8890ab7758 ("Introduce if_mgb driver...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32231
(cherry picked from commit 1ad2d87778)
In NTB gen3 driver, it was supposed to disable NTB bar access by
default, but due to incorrect register access method, the bar disable
logic does not work as expected. Those registers should be modified
through NTB bar0 rather than PCI configuration space.
Besides, we'd better to protect ourselves from a bad buddy node so
ingress disable logic should be implemented together.
Submitted by: Austin Zhang (austin.zhang@dell.com)
Sponsored by: Dell EMC
(cherry picked from commit e3cf7ebc1d)
In vt_kms, the postswitch callback restores fbdev mode when
panicking or entering the debugger. This ensures that even when
a graphical applicatino was running on the first tty, simple framebuffer
mode would be restored and the panic would be visible instead
of the frozen GUI. But vt wouldn't call the postswitch callback
when we're already on the first tty, so running a GUI on it
would prevent you from reading any panics.
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D29961
(cherry picked from commit c937a405bd)
This allows to avoid blocking on Giant in callout context, moving to
already existing dedicated taskqueue_pci_hp thread.
MFC after: 1 month
(cherry picked from commit fa3b03d378)
The code explicitly takes Giant when it accesses keyboard, and I see
no reason to take it globally by callout(9).
MFC after: 1 month
(cherry picked from commit da69c67526)