The maximum page size in the mkey context is 2GB.
Until today, we didn't enforce this requirement in the code, and therefore,
if we got a page size larger than 2GB, we have passed zeros in the
log_page_shift instead of the actual value and the registration failed.
This patch limits the driver to use compound pages of 2GB for mkeys.
Linux commit:
762f899ae7875554284af92b821be8c083227092
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The patch simplifies mlx5_ib_cont_pages and fixes the following
issues in the original implementation:
First issues is related to alignment of the PFNs. After the check
base + p != PFN, the alignment of the PFN wasn't checked. So the PFN
sequence 0, 1, 1, 2 would result in a page_shift of 13 even though
the 3rd PFN is not 8KB aligned.
This wasn't actually a bug because it was supported by all the
existing mlx5 compatible device, but we don't want to require
this support in all future devices.
Another issue is because the inner loop didn't advance PFN so
the test "if (base + p != pfn)" always failed for SGE with
len > (1<<page_shift).
Linux commit:
d67bc5d4e3e100d762c0f57ea67f28bc219698a6
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
- Upon error more completion events than requested may be generated,
particularly when using the completion event factor feature.
- Count number of event errors in the transmit path.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
On some environments, such as certain SRIOV VF configurations, RoCE is
not supported for mlx5 Ethernet ports. Currently, the driver will not
open IB device on that port.
This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET
QPs) functionality to remain in place. For that end, enhance the relevant
driver flows such that we do create a device instance in that case.
Linux commit:
ca5b91d63192ceaa41a6145f8c923debb64c71fa
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Make the mlx5e_mode_table[] array one dimensional, because there is only
one entry, 10G ER/LR, which share the same protocol bit.
This patch only adds support for basic sub-type distinguishing for the
extended protocol bits. Use verbose ifconfig eeprom output to get actual
media type.
Remove write only "connector_type" variable while at it.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
This code practically has not sleeping points, so Giant is locked for very
long time.
Noted and reviewed by: hselasky
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Import Linux commit 534b1204ca4694db1093b15cf3e79a99fcb6a6da
Add reserved mapping to cover all the register in order to avoid setting
arbitrary values to newer FW which implements the reserved fields.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies // NVIDIA Networking
MFC after: 1 week
Import Linux commit ce28f0fd670ddffcd564ce7119bdefbaf08f02d3:
Add reserved mapping to cover all the register in order to avoid
setting arbitrary values to newer FW which implements the reserved
fields.
Taken from: https://patches.linaro.org/patch/417255/
Reviewed by: hselasky
Sponsored by: Mellanox Technologies // NVIDIA Networking
MFC after: 1 week
In particular, avoid creating TIR or installing flow rules for VXLAN
if the capability is disabled.
Reported and reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
Handlers maintain flow rules and inform hardware about non-standard VxLAN
port in use. The database of the vxlan end points is maintained.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
sys/dev/sound/pci/hda/hdaa_patches.c:
match_pin_patches: Use HDA_DEV_MATCH instead of regular ==
sys/dev/sound/pci/hda/pin_patch_realtek.h:
Add quirk for Lenovo laptops when ALC298 is used.
This controller supports 2.5G/1G/100MB/10MB speeds, and allows
tx/rx checksum offload, TSO, LRO, and multi-queue operation.
The driver was derived from code contributed by Intel, and modified
by Netgate to fit into the iflib framework.
Thanks to Mike Karels for testing and feedback on the driver.
Reviewed by: bcr (manpages), kbowling, scottl, erj
MFC after: 1 month
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30668
As in commit 831850d8b0, this routine can trigger false positives, so
exclude it from instrumentation.
Reported by: pho
Sponsored by: The FreeBSD Foundation
during suspend/resume cycle. Previously used bus_generic_suspend_intr and
bus_generic_resume_intr may cause interrupt storm because of missed
interrupt acknowledges caused by blocking of intr handler.
Reported by: J.R. Oldroyd <jr_AT_opal_DOT_com>
MFC after: 1 week
These have been found in some Arm ACPI tables generated by edk2, e.g.
when describing the pl011 uart on the Arm AEMv8 model.
Reviewed by: imp, jkim
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31110
During the removal of cam_sim_alloc_dev() in
aeb04e88f5 for sdhci.c and the
follow-up build-fix in a72af82e31
slot->dev and slot->bus got mixed up for MMCCAM; slot->dev is
only used in the !MMCCAM case so is uninitialised here leading to
a panic; switch back to slot->bus to return to the status quo.
Reviewed by: imp (ack on arm@)
X-Differential Revision: https://reviews.freebsd.org/D30857
With the v5.13 device-tree update speed of the CPU switch port was
changed to 2.5G. Reflect that in the driver.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
New Samsung 980 SSDs report Namespace Preferred Write Alignment of
8 (4KB) and Namespace Preferred Write Granularity of 32 (16KB).
My quick tests show that 16KB is a minimal sequential write size
when the SSD reaches peak IOPS, so writing much less is very slow.
But writing slightly less or slightly more does not change much,
so it seems not so much a size granularity as minimum I/O size.
Thinking about different stripesize consumers:
- Partition alignment should be based on NPWA by definition.
- ZFS ashift in part of forcing alignment of all I/Os should also
be based on NPWA. In part of forcing size granularity, if really
needed, it may be set to NPWG, but too big value can make ZFS too
space-inefficient, and the 16KB is actually the biggest supported
value there now.
- ZFS recordsize/volblocksize could potentially be tuned up toward
NPWG to work as I/O size granularity, but enabled compression makes
it too fuzzy. And those are normally user-configurable things.
- ZFS I/O aggregation code could definitely use Optimal Write Size
value and may be NPWG, but we don't have fields in GEOM now to report
the minimal and optimal I/O sizes, and even maximal is not reported
outside GEOM DISK to be used by ZFS.
MFC after: 1 week
Make it work, but change the interface to be safe for non-root users. In
particular, right now interface only works for the tables which can be
minimally parsed by kernel to determine the table size. Then, userspace can
query the table size, after that it provides a buffer of needed size
and kernel copies out just table to userspace.
Main advantage is that user no longer need to be able to read /dev/mem,
the disadvantage is the need to have minimal parsers aware of the table
types. Right now the parsers are implemented for ESRT and PROP tables.
Future extension of the present interface might be a return of only
the table physical address, in case kernel does not have suitable
parser yet. Then, a privileged user could read the table from /dev/mem.
This extension, which logically equivalent to the old (non-worked)
EFIIOC_GET_TABLE variant, is not implemented until needed.
Submitted by: Pavel Balaev <pavel.balaev@3mdeb.com>
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30104
Coherently read the phase bit of the status completion record. We loop
over the completion record array, looking for all the transactions in
the same phase that have been completed. In doing that, we have to be
careful to read the status field first, and if it indicates a complete
record, we need to read and process that record. Otherwise, the host
might be overtaken by device when reading this completion record,
leading to a mistaken belief that the record is in phase. This leads to
the code using old values and looking at an already completed entry, which
has no current tracker.
To work around this problem, we read the status and make sure it is in
phase, we then re-read the entire completion record guaranteeing it's
complete, valid, and consistent . In addition we resync the dmatag to
reflect changes since the prior loop for the bouncing dma case.
Reviewed by: jrtc27@, chuck@
Found by: jrtc27 (this fix is based in part on her D30995 fix)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31002
Remove __packed from nvme_command, nvme_completion and
nvme_dsm_trim. Add super-alignment to nvme_completion since it's always
at least that aligned in hardware (and in our existing uses of it
embedded in structures). It generates better code in
nvme_qpair_process_completions on riscv64 because otherwise the ABI
assumes a 4-byte alignment, and the same on all other platforms.
Reviewed by: jrtc27@, mav@, chuck@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31001