As in commit 831850d8b0, this routine can trigger false positives, so
exclude it from instrumentation.
Reported by: pho
Sponsored by: The FreeBSD Foundation
during suspend/resume cycle. Previously used bus_generic_suspend_intr and
bus_generic_resume_intr may cause interrupt storm because of missed
interrupt acknowledges caused by blocking of intr handler.
Reported by: J.R. Oldroyd <jr_AT_opal_DOT_com>
MFC after: 1 week
These have been found in some Arm ACPI tables generated by edk2, e.g.
when describing the pl011 uart on the Arm AEMv8 model.
Reviewed by: imp, jkim
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31110
During the removal of cam_sim_alloc_dev() in
aeb04e88f5 for sdhci.c and the
follow-up build-fix in a72af82e31
slot->dev and slot->bus got mixed up for MMCCAM; slot->dev is
only used in the !MMCCAM case so is uninitialised here leading to
a panic; switch back to slot->bus to return to the status quo.
Reviewed by: imp (ack on arm@)
X-Differential Revision: https://reviews.freebsd.org/D30857
With the v5.13 device-tree update speed of the CPU switch port was
changed to 2.5G. Reflect that in the driver.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
New Samsung 980 SSDs report Namespace Preferred Write Alignment of
8 (4KB) and Namespace Preferred Write Granularity of 32 (16KB).
My quick tests show that 16KB is a minimal sequential write size
when the SSD reaches peak IOPS, so writing much less is very slow.
But writing slightly less or slightly more does not change much,
so it seems not so much a size granularity as minimum I/O size.
Thinking about different stripesize consumers:
- Partition alignment should be based on NPWA by definition.
- ZFS ashift in part of forcing alignment of all I/Os should also
be based on NPWA. In part of forcing size granularity, if really
needed, it may be set to NPWG, but too big value can make ZFS too
space-inefficient, and the 16KB is actually the biggest supported
value there now.
- ZFS recordsize/volblocksize could potentially be tuned up toward
NPWG to work as I/O size granularity, but enabled compression makes
it too fuzzy. And those are normally user-configurable things.
- ZFS I/O aggregation code could definitely use Optimal Write Size
value and may be NPWG, but we don't have fields in GEOM now to report
the minimal and optimal I/O sizes, and even maximal is not reported
outside GEOM DISK to be used by ZFS.
MFC after: 1 week
Make it work, but change the interface to be safe for non-root users. In
particular, right now interface only works for the tables which can be
minimally parsed by kernel to determine the table size. Then, userspace can
query the table size, after that it provides a buffer of needed size
and kernel copies out just table to userspace.
Main advantage is that user no longer need to be able to read /dev/mem,
the disadvantage is the need to have minimal parsers aware of the table
types. Right now the parsers are implemented for ESRT and PROP tables.
Future extension of the present interface might be a return of only
the table physical address, in case kernel does not have suitable
parser yet. Then, a privileged user could read the table from /dev/mem.
This extension, which logically equivalent to the old (non-worked)
EFIIOC_GET_TABLE variant, is not implemented until needed.
Submitted by: Pavel Balaev <pavel.balaev@3mdeb.com>
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30104
Coherently read the phase bit of the status completion record. We loop
over the completion record array, looking for all the transactions in
the same phase that have been completed. In doing that, we have to be
careful to read the status field first, and if it indicates a complete
record, we need to read and process that record. Otherwise, the host
might be overtaken by device when reading this completion record,
leading to a mistaken belief that the record is in phase. This leads to
the code using old values and looking at an already completed entry, which
has no current tracker.
To work around this problem, we read the status and make sure it is in
phase, we then re-read the entire completion record guaranteeing it's
complete, valid, and consistent . In addition we resync the dmatag to
reflect changes since the prior loop for the bouncing dma case.
Reviewed by: jrtc27@, chuck@
Found by: jrtc27 (this fix is based in part on her D30995 fix)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31002
Remove __packed from nvme_command, nvme_completion and
nvme_dsm_trim. Add super-alignment to nvme_completion since it's always
at least that aligned in hardware (and in our existing uses of it
embedded in structures). It generates better code in
nvme_qpair_process_completions on riscv64 because otherwise the ABI
assumes a 4-byte alignment, and the same on all other platforms.
Reviewed by: jrtc27@, mav@, chuck@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31001
Put the { on the same line as the struct nvme_foo when we define these
structures. It's FreeBSD standard and these were inconsistent.
Sponsored by: Netflix
Subtract one SGE for the case of misaligned address. Also take into
account maximum number of sectors reported by firmware, that gives
nicer 256KB limit instead of 276KB calculated from the SGE limit.
While there, remove number of I/O size checks, duplicating what is
already checked by CAM and busdma(9).
MFC after: 1 month
Sponsored by: iXsystems, Inc.
phandle_t is a uint32_t type, <= 0 comparison doesn't work with it as intended.
This caused the ofw_pci code to attach to PCI bus on ACPI based systems.
Since 3eae4e106a ("Fix error value returned by ofw_bus_gen_get_node().")
ofw subsystem can only return -1 for invalid nodes. Use that.
MFC after: 4 weeks
Reviewed by: mw
Differential revision: https://reviews.freebsd.org/D30953
Make it possible to specify event codes without an offset of
PMC_EV_ARMV8_FIRST, by setting a machine-dependent flag. This is
required to make use of event definitions from pmu-events.
Reviewed by: ray (slightly earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30602
This will be used to detect supported pmu events. The expected format is
the MIDR register with the revision and variant fields masked. See also:
lib/libpmc/pmu-events/arch/arm64/mapfile.csv.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30601
Fix forgotten argument and type error. MMCCAM isn't enabled by default,
and I'd mistakenly thought it was, so these went undetected precommit.
Sponsored by: Netflix
xpt_bus_register and xpt_bus_deregister returns a hybrid error that's
neither a cam_status, nor an errno, but a mix of both. Update
xpt_bus_register and xpt_bus_deregister to return an errno. The vast
majority of current users compare against zero, which can also be
spelled CAM_SUCCESS. Nobody uses CAM_FAILURE, so remove that symbol
to prevent comfusion (nothing returns it either).
Where the return value is saved, ensure that the variable 'error' is
used to store an errno and 'status' is used to store a cam_status where
it makes the code clearer (usually just in functions that already mix
and match). Where the return value isn't used at all, avoid storing it
at all.
Reviewed by: scottl@, mav@ (earlier version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30860
Use the new xpt_path_device to get the device_t using the periph's path.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30855
This makes it more usable in that dhclient will autolaunch from devd
now when cdce devices are plugged in.. It also sets the baudrate, but
this isn't exported via tools, and CDCE doesn't have a good way to
specify the media type, so there isn't a good way to tell userland
what the speed is currently...
Reviewed by: hps
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D30625
The driver used to configure all available classes with some default
parameters on attach and the rest of t4_sched.c was written with the
assumption that all traffic classes are always valid in the hardware.
But this resulted in a lot of informational messages being logged in the
firmware's circular log, crowding out other more useful messages.
This change leaves the tx scheduler alone during attach to reduce the
spam in the devlog. The state of every class is now tracked separately
from its flags and there is support for an 'uninitialized' state.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Recent firmwares are able to utilize the traffic classes of tx channels
that were previously unused. This effectively doubles the number of
traffic classes available per port for 2 port cards. Stop using the raw
per-channel value in the driver and ask the firmware for the number of
usable traffic classes instead.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Some of the changes in this release:
* Large LLQ headers,
* Bug/stability fixes,
* Change of the README/Documentation.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Update the driver in order not to break its compilation
and make use of the new ENA logging system
Migrate platform code to the new logging system provided by ena_com
layer.
Make ENA_INFO the new default log level.
Remove all explicit use of `device_printf`, all new logs requiring one
of the log macros to be used.
IO queue related attributes are registered statically at driver attach
with the rest of the ENA specific sysctl nodes. However, the number of
queues can be changed at runtime via the `ena_sysctl_io_queues_nb`
request, leading to a potential exposure of attributes for non-existing
queues.
Introduce a new `ena_sysctl_update_queue_node_nb` function, which
updates the sysctl nodes after the number of queues is altered.
This happens by either registering or unregistering node specific oids,
based on a delta between the previous and current queue count.
NOTE: All unregistered oids must be registered again before the driver
detach, e.g. by another call to this function.
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Calling free on a NULL pointer is valid, as appropriate check is already
done internally:
/* free(NULL, ...) does nothing */
if (addr == NULL)
return;
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Default LLQ (Low-latency queue) maximum header size is 96 bytes and can
be too small for some types of packets - like IPv6 packets with multiple
extension. This can be fixed, by using large LLQ headers.
If the device supports larger LLQ headers, the user can activate this
feature by setting sysctl tunable 'hw.ena.force_large_llq_header' to '1'
in the /boot/loader.conf file.
In case the device isn't supporting this feature, the default value (96B)
will be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
According to man style(9), only C-style comments should be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Implement support for the NXP LS1028A SoC MDIO controller.
It is attached to the internal PCI root complex.
The controller is used to communicate with PHYs of ports connected
to the internal switch.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30731
ENETC it a gigabit Ethernet controller found on the LS1028A board.
It supports basic VLAN offloads - tag extraction, injection and hardware
filtering. Inband MDIO connectivity is used for link status
monitoring through the miibus interface. Fixed-link mode is also
supported, which allows for operation of internal cpu to switch port.
Since no admin interrupts are present in hardware, link status polling
has to be used.
Due to a hardware bug software reset of the NIC results in a external
abort. Because of that most of the hardware initialization is done
during attach. This also means that in the case of an fatal error full
board reset is required.
The enetc_hw.h header was imporoted from Linux. It is dual licensed.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30729
Provide common MDIO code for two LS1028 ENETC controllers -
an external one found on the PCI bus and internal one found in ENETC.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30730
By definition ofw_bus_get_node() should consistently return -1 when there
is no associated OF node.
MFC after: 4 weeks
Discussed with: nwhitehorn
Analyzed in: https://reviews.freebsd.org/D30761
ddfc9c4c59 was missing changes to two files to complete the
bus_child_pnpinfo_str->bus_child_pnpinfo. This fixes the broken kernel
builds.
Sponsored by: Netflix
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.
Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.
Document these new interfaces with man pages, and oversight from before.
Reviewed by: jhb, bcr
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29937
If the connection is in the process of disconnecting, ic_socket can be
NULL. For icl_cxgbei_conn_transfer_setup(), lock the connection and
check ic_socket before using it. For icl_cxgbei_conn_task_setup(),
the caller already holds the connection lock, so assert it and bail
early with ECONNRESET if the connection is disconnecting.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: f949967c8e cxgbei: Fix a race between transfer setup and a peer reset.
The INQUIRY command may return a CAM_DATA_RUN_ERR code, even when
it succeeds. This happens during driver startup, causing the
current and further inquiries to be aborted, resulting in some
missing information about the controller.
Reviewed by: imp
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D30843
Reserve one page for the DMA subsystem, that may need it when the I/O
buffer is not page aligned.
Without this change, writes with the maximum allowed size failed, if:
- physical memory was fragmented, making it necessary to use one DMA
segment for each page
- the buffer to be written was not page aligned, causing the DMA
subsystem to need one extra segment
In the scenario above, the DMA subsystem would run out of segments,
resulting in a write with no SG segments, that would fail.
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D30798
ni_txseqs is kept as 16-bit counter, but we need to trim the upper four
bits as they may have special meanings for the firmware / hardware.
For instance, bit 15 enables hardware / firmware generation of sequence
numbers that overrides sequence numbers programmed by the driver.
Reviewed by: adrian
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30814
Given all the code does operate on struct ifnet, the last step in this
longer series of changes now is to rename struct net_device to
struct ifnet (that is what it was defined to in the LinuxKPi code).
While mlx4 and OFED are "shared" code the decision was made years ago
to not write it based on the netdevice KPI but the native ifnet KPI
for most of it. This commit simply spells this out and with that
frees "struct netdevice" to be re-done on LinuxKPI to become a more
native/mixed implementation over time as needed by, e.g., wireless
drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30515
The NIC TLS and TOE TLS modes in cxgbe(4) both work with TLS key
contexts. Previously, TOE TLS supported TLS key contexts created by
two different methods, and NIC TLS had a separate bit of code copied
from NIC TLS but specific to KTLS. Now that TOE TLS only supports
KTLS, pull common code for creating TLS key contexts and programming
them into on-card memory into t4_keyctx.c.
Sponsored by: Chelsio Communications
TOE TLS offload was first supported via a customized OpenSSL developed
by Chelsio with proprietary socket options prior to KTLS being present
either in FreeBSD or upstream OpenSSL. With the addition of KTLS in
both places, cxgbe's TOE driver was extended to support TLS offload
via KTLS as well. This change removes the older interface leaving
only the KTLS bindings for TOE TLS.
Since KTLS was added to TOE TLS second, it was somehat shoe-horned
into the existing code. In addition to removing the non-KTLS TLS
offload, refactor and simplify the code to assume KTLS, e.g. not
copying keys into a helper structure that mimic'ed the non-KTLS mode,
but using the KTLS session object directly when constructing key
contexts.
This also removes some unused code to send TX keys inline in work
requests for TOE TLS. This code was never enabled, and was arguably
sending the wrong thing (it was not sending the raw key context as we
do for NIC TLS when using inline keys).
Sponsored by: Chelsio Communications
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly. As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING(). No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation