These registers have read side effects and a read at just the right
(wrong?) time can trash some internal hw state.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
(cherry picked from commit f13920b39b)
The firmwares and the following changelog are from the "Chelsio Unified
Wire v3.15.0.0 for Linux."
Version : 1.26.2.0
Date : 09/24/2021
====================
FIXES
-----
BASE:
- Added support for SFP+ RJ45 (0x1C).
- Fixing backward compatibility issue with older drivers when multiple
speeds are passed to firmware.
OFLD:
- Do not touch tp_plen_max if driver is supplying tp_plen_max. This
fixes a connection reset issue in iscsi.
ENHANCEMENTS
------------
BASE:
- Firmware header modified to add firmware binary signature.
Sponsored by: Chelsio Communications
(cherry picked from commit 45d6fbaec2)
Changes since 1.25.6.0 are listed here. This list comes from the
Release Notes for "Chelsio Unified Wire 3.14.0.4 for Linux" dated
2021-07-08.
Fixes
-----
BASE:
- Wait 5ms before and after the i2c command that clears the mod_select.
This fixes incorrect port module type read from i2c.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
(cherry picked from commit 3c900106ea)
The driver used to configure all available classes with some default
parameters on attach and the rest of t4_sched.c was written with the
assumption that all traffic classes are always valid in the hardware.
But this resulted in a lot of informational messages being logged in the
firmware's circular log, crowding out other more useful messages.
This change leaves the tx scheduler alone during attach to reduce the
spam in the devlog. The state of every class is now tracked separately
from its flags and there is support for an 'uninitialized' state.
Sponsored by: Chelsio Communications
(cherry picked from commit ec8004dd41)
Recent firmwares are able to utilize the traffic classes of tx channels
that were previously unused. This effectively doubles the number of
traffic classes available per port for 2 port cards. Stop using the raw
per-channel value in the driver and ask the firmware for the number of
usable traffic classes instead.
Sponsored by: Chelsio Communications
(cherry picked from commit 6beb67c7e0)
Use the correct SGL limit within iw_cxgbe, firmwares >= 1.25.6.0 support
upto 512 entries per MR.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
(cherry picked from commit 211972cfb8)
Changes since 1.25.0.0 are listed here. This list comes from the
Release Notes for the "Chelsio Unified Wire v3.14.0.3 for Linux"
release dated 2021-05-21.
Fixes
-----
BASE:
- Fixed Back to back T6 100G-CR4 link coming up with NO FEC sometimes.
- [T5] Try to bring up link in 1G speed if link doesn't come up on 10G.
- Fixed a bug to not allow BaseR fec in 100G speed.
- Fixed linkup issues on BT adapter in 1G and 100M speed.
- Fixed an issue to allow driver to send VI_ENABLE multiple times (once
with rx disable and then later rx enable).
- Fixed rate limiting not working on class number 16 to 30.
- Fixed backward compatibility issue in port type interpretation with vpd
version 0x80.
ETH:
- Fixed a case when firmware failed to deliver NIC WR completion to host.
- No rate limit support for WR ETH_TX_PKTS2 due to performance reasons.
OFLD
- Fixed a connection hang in SO adapters when tp_plen_max (set by driver)
is more than the window size.
- Added fw_filter_vnic_mode to firmware API file (t4fw_interface.h)
- Use correct rx channel in coprocessor crypto completion (CPL_FW6_PLD). This
was causing out of order completion to host.
FOiSCSI
- Fixed a crash due to unaligned access of ipv6 address.
- Fixed a crash during lun reset.
Enhancements
------------
ETH:
- Rate limiting support added for encapsulated (vxlan, nvgre, geneve) NIC TCP
packets.
OFLD:
- More than 128 SGLs supported in FW_RI_FR_NSMR_WR. Now, more than 16GB
(upto 64GB) of PBLs can be written with single FW_RI_FR_NSMR_WR.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
(cherry picked from commit e0fa04e257)
CTRL and OFLD tx queues do not have automatic tx credit flush enabled so
it is okay for the cidx not to be the same as the pidx when the queue is
destroyed.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
(cherry picked from commit 5ef87bf8b6)
- Process the list of local IPs once instead of once per adapter. Add
addresses from all VNETs to the driver's list but leave hardware
updates for later when the global VNET/IFADDR list locks have been
released.
- Add address to the hardware table synchronously when a CLIP entry is
requested for an address that's not already in there.
- Provide ioctls that allow userspace tools to manage addresses in the
CLIP table.
- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
automatically added to the CLIP table or not.
Sponsored by: Chelsio Communications
(cherry picked from commit 24b98f288d)
Add suspend/resume callbacks to the driver and a live reset built around
them. This commit covers the basic NIC and future commits will expand
this functionality to other stateful parts of the chip. Suspend and
resume operate on the chip (the t?nex nexus device) and affect all its
ports. It is not possible to suspend/resume or reset individual ports.
All these operations can be performed on a running NIC. A reset will
look like a link bounce to the networking stack.
Here are some ways to exercise this functionality:
/* Manual suspend and resume. */
# devctl suspend t6nex0
# devctl resume t6nex0
/* Manual reset. */
# devctl reset t6nex0
/* Manual reset with driver sysctl. */
# sysctl dev.t6nex.0.reset=1
/* Automatic adapter reset on any fatal error. */
# hw.cxgbe.reset_on_fatal_err=1
Suspend disables the adapter (DMA, interrupts, and the port PHYs) and
marks the hardware as unavailable to the driver. All ifnets associated
with the adapter are still visible to the kernel but operations that
require hardware interaction will fail with ENXIO. All ifnets report
link-down while the adapter is suspended.
Resume will reattach to the card, reconfigure it as before, and recreate
the queues servicing the existing ifnets. The ifnets are able to send
and receive traffic as soon as the link comes back up.
Reset is roughly the same as a suspend and a resume with at least one of
these events in between: D0->D3Hot->D0, FLR, PCIe link retrain.
(cherry picked from commit 83b5cda106)
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC. This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.
This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):
/* Idempotent */
alloc_foo
{
if (!SW_ALLOCATED)
init_iq/init_eq/init_fl no-fail sw init
alloc_iq_fl/alloc_eq/alloc_wrq may-fail sw alloc
add_foo_sysctls, etc. no-fail post-alloc items
if (!HW_ALLOCATED)
alloc_iq_fl_hwq/alloc_eq_hwq hw resource allocation
}
/* Idempotent */
free_foo
{
if (!HW_ALLOCATED)
free_iq_fl_hwq/free_eq_hwq release hw resources
if (!SW_ALLOCATED)
free_iq_fl/free_eq/free_wrq release sw resources
}
The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent. The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.
Sponsored by: Chelsio Communications
(cherry picked from commit 43bbae1948)
For SCSI data underrun is a part of normal life. It should not be
reported as error. This fixes MODE SENSE used by modern CAM.
MFC after: 1 month
(cherry picked from commit e8144a13e0)
Summary:
An error mapping PCI resources results in a panic due to unallocated
resources being freed up. This change puts the appropriate checks in
place to prevent the panic.
PR: 252445
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
Tested by: marcus
MFC after: 1 week
Sponsored by: VMware
Test Plan:
Along with user testing, also simulated error by inserting a ENXIO
return in vmci_map_bars().
Reviewed by: marcus
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D32016
(cherry picked from commit 0f14bcbe38)
Looking for "tsc-tsc" in the pmu tables will fail every time. Instead,
make this an alias for the static TSC event defined in pmc_events.h.
This fixes 'pmcstat -s cycles' on Intel and AMD.
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32197
(cherry picked from commit 937539e0a3)
It fixes lock ordere reversal between SIM and device locks. Also
remove registration for AC_FOUND_DEVICE, unused for a while now.
MFC after: 1 month
(cherry picked from commit 02d8194012)
This is useful for diagnosing problems. In particular, the errata
sheets identify the EEPROM version for many fixes.
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32333
(cherry picked from commit 293663f4da)
Otherwise results in KASSERT with debug kernels because we rely on the
iflib CTX lock to implement the software serialization to the NVM model
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32333
(cherry picked from commit 9b3e252e59)
This is the same underlying problem as 2624598064, just for bus ranges
rather than windows. SiFive's HiFive Unmatched has the following
topology:
Root Port <---> Bridge <---> Bridge <-+-> Bridge <---> (Unused)
(pcib0) (pcib1) (pcib2) | (pcib3)
+-> Bridge <---> xHCI
| (pcib4)
+-> Bridge <---> M.2 E-key
| (pcib5)
+-> Bridge <---> M.2 M-key
| (pcib6)
+-> Bridge <---> x16 slot
(pcib7)
If a device is plugged into the x16 slot that itself has a bridge, such
as many graphics cards, we currently fail to allocate a bus number for
its child bus (and so pcib_attach_child skips adding a child bus for
further enumeration) as, when the new child bridge attaches, it attempts
to allocate a bus number from its parent (pcib7) which in turn attempts
to grow its own bus range by calling bus_adjust_resource on its own
parent (pcib2) whose bus rman cannot accommodate the request and needs
to itself be extended by calling its own parent (pcib1). Note that
pcib3-7 do not face the same issue when they attach since pcib1 ends up
managing bus numbers 1-255 from the beginning and so never needs to grow
its own range.
Reviewed by: jhb, mav
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32011
(cherry picked from commit 31776afdc7)
Drop arguments of function prototypes since the file is mixed between
listing arg names and not.
No functional changes
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D32329
(cherry picked from commit 28ccd780a9)
ena: Remove redundant declaration of ena_log_level.
GCC6 raises a -Wredundant-decl error due to duplicate declarations
in ena_fbsd_log.h and ena_plat.h.
Sponsored by: Chelsio Communications
(cherry picked from commit 8843787aa1)
ena: Avoid unnecessary mbuf collapses for LLQ condition
In case of Low-latency Queue, one small enough descriptor can be pushed
directly to the ENA hw, thus saving one fragment. Check for this
condition before performing collapse.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit c81f8c2611)
ena: Trigger reset on ena_com_prepare_tx failure
All ena_com_prepare_tx errors other than ENA_COM_NO_MEM are fatal and
require device reset.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 36130d2979)
ena: Prevent reset after device destruction
Check for ENA_FLAG_TRIGGER_RESET inside a locked context in order to
avoid potential race conditions with ena_destroy_device. This aligns the
reset task logic with the Linux driver.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 433ab9b698)
ena: Add extra log messages
Stay aligned with the Linux driver by adding the following logs:
* inform the user about retrying queue creation
* warn on non-empty ena_tx_buffer.mbuf prior to ena_tx_map_mbuf
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 77160654a1)
ena: Add locking assertions
ENA silently assumed that ena_up, ena_down and ena_start_xmit routines
should be called within locked context. Driver's logic heavily assumes
on concurrent access to those routines, so for safety and better
documentation about this assumption, the locking assertions were added
to the above functions.
The assertion was added only for the main steps (skipping the helper
functions) which can be called from multiple places including the kernel
and the driver itself.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit cb98c439d6)
ena: Move RSS logic into its own source files
Delegate RSS related functionality into separate .c/.h files in
preparation for the full RSS support.
While at it, reorder functions and remove prototypes for ones with
internal linkage.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 986e7b9227)
ena: Disable meta descriptor caching for netmap
If LLQ is being used, `ena_tx_ctx.meta_valid` must stay enabled. This
fixes netmap support on latest generation ENA HW and aligns it with the
core driver behavior.
As netmap doesn't support any csum offloads, the
`adapter->disable_meta_caching` value can be simply passed to the HW.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit a831466830)
ena: Share ena_global_lock between driver instances
In order to use `ena_global_lock` in sysctl context, it must be kept
outside the driver instance's software context, as sysctls can be called
before attach and after detach, leading to lock use before sx_init and
after sx_destroy otherwise.
Solve this issue by turning `ena_global_lock` into a file scope
variable, shared between all instances of the driver and associated
sysctl context, and in turn initialized/destroyed in dedicated
SYSINIT/SYSUNINIT functions.
As a side effect, this change also fixes existing race in the reset
routine, when simultaneously accessing sysctl exposed properties.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 07aff471c0)
ena: Add missing statistics
Provide the following sysctl statistics in order to stay aligned with
the Linux driver:
* rx_ring.csum_good
* tx_ring.unmask_interrupt_num
Also rename the 'bad_csum' statistic name to 'csum_bad' for alignment.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 223c8cb12e)
ena: Implement full RSS reconfiguration
Bind RX/TX queues and MSI-X vectors to matching CPUs based on the RSS
bucket entries.
Introduce sysctls for the following RSS functionality:
- rss.indir_table: indirection table mapping
- rss.indir_table_size: indirection table size
- rss.key: RSS hash key (if Toeplitz used)
Said sysctls are only available when compiled without `option RSS`, as
kernel-side RSS support currently doesn't offer RSS reconfiguration.
Migrate the hash algorithm from CRC32 to Toeplitz and change the initial
hash value to 0x0 in order to match the standard Toeplitz implementation.
Provide helpers for hash key inversion required for HW operations.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 6d1ef2abd3)
ena: fix building in-kernel driver
When building ENA as compiled into the kernel, the driver would fail to
build. Resolve the problem by introducing the following changes:
1. Add missing `ena_rss.c` entry in `sys/conf/files`.
2. Prevent SYSCTL_ADD_INT from throwing an assert due to an extra
CTLTYPE_INT flag.
Fixes: 986e7b9227 ("ena: Move RSS logic into its own source files")
Fixes: 6d1ef2abd3 ("ena: Implement full RSS reconfiguration")
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week
(cherry picked from commit a3f0d18237)
ena: Update driver version to v2.4.1
Some of the changes in this release:
* Hardware RSS hash key reconfiguration and indirection table
reconfiguration support.
* Full kernel RSS support.
* Extra statistic counters.
* Netmap support for ENAv3.
* Locking assertions.
* Extra log messages.
* Reset handling fixes.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 42c7760be3)
ena: change ENA C++-style comment into C-style
According to man style(9), only C-style comments should be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 438c9e3cf8)
ena: add support for the large LLQ headers in ENA
Default LLQ (Low-latency queue) maximum header size is 96 bytes and can
be too small for some types of packets - like IPv6 packets with multiple
extension. This can be fixed, by using large LLQ headers.
If the device supports larger LLQ headers, the user can activate this
feature by setting sysctl tunable 'hw.ena.force_large_llq_header' to '1'
in the /boot/loader.conf file.
In case the device isn't supporting this feature, the default value (96B)
will be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit beaadec9ea)
ena: remove surplus NULL checks when freeing ENA resources
Calling free on a NULL pointer is valid, as appropriate check is already
done internally:
/* free(NULL, ...) does nothing */
if (addr == NULL)
return;
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit ddec69e6a7)
ena: hide sysctl nodes for unused ENA queues
IO queue related attributes are registered statically at driver attach
with the rest of the ENA specific sysctl nodes. However, the number of
queues can be changed at runtime via the `ena_sysctl_io_queues_nb`
request, leading to a potential exposure of attributes for non-existing
queues.
Introduce a new `ena_sysctl_update_queue_node_nb` function, which
updates the sysctl nodes after the number of queues is altered.
This happens by either registering or unregistering node specific oids,
based on a delta between the previous and current queue count.
NOTE: All unregistered oids must be registered again before the driver
detach, e.g. by another call to this function.
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 0e7d31f63b)
Merge tag 'vendor/ena-com/2.4.0'
Update the driver in order not to break its compilation
and make use of the new ENA logging system
Migrate platform code to the new logging system provided by ena_com
layer.
Make ENA_INFO the new default log level.
Remove all explicit use of `device_printf`, all new logs requiring one
of the log macros to be used.
(cherry picked from commit 3fc5d816f8)
Update ENA driver man page
Bring the obsolete man page up to date:
* update diagnostic error messages
* add documentation of loader tunables
* document netmap support
* add a driver history section
* update the contact information
Submitted by: Artur Rojek <ar@semihalf.com>
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit e34856a2c4)
Update ENA version to v2.4.0
Some of the changes in this release:
* Large LLQ headers,
* Bug/stability fixes,
* Change of the README/Documentation.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 93f0df457b)
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a
socket rather than a socket buffer. Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.
When locking for I/O, return an error if the socket is a listening
socket. Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.
Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In
particular, add checks to sendfile() and sorflush().
Reviewed by: tuexen, gallatin
Sponsored by: The FreeBSD Foundation
(cherry picked from commit f94acf52a4)
When a usb device is detached, usb_pc_dmamap_destroy() called
bus_dmamap_destroy() while the map was still loaded. That's harmless on x86
architectures, but on all other platforms it causes bus_dmamap_destroy() to
return EBUSY and leak away any memory resources (including bounce buffers)
associated with the mapping, as well as any allocated map structure itself.
This change introduces a new is_loaded flag to the usb_page_cache struct to
track whether a map is loaded or not. If the map is loaded,
bus_dmamap_unload() is called before bus_dmamap_destroy() to avoid leaking
away resources.
Differential Revision: https://reviews.freebsd.org/D32208
(cherry picked from commit dc91a9715f)
Previously, we were collecting at a base rate of:
64 bits x 32 pools x 10 Hz = 2.5 kB/s
This change drops it to closer to 64-ish bits per pool per second, to
work a little better with entropy providers in virtualized environments
without compromising the security goals of Fortuna.
(cherry picked from commit 5e79bba562)
Refer to discussion in PR 230808 for a less incomplete discussion, but
the gist of this change is that we currently collect orders of magnitude
more entropy than we need.
The excess comes from bytes being read out of /dev/*random. The default
rate at which we collect entropy without the read_rate increase is
already more than we need to recover from a compromise of an internal
state.
For stable/13, the read_rate_increment symbol remains as a stub to avoid
breaking loadable random modules.
(cherry picked from commit 6895cade94)