These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool. Hash and
normal TCAM filters can be used together. The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters. Any T5/T6 card with memory can support at
least half a million hash filters. The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.
The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file. The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto). It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).
For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan
Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.
Hash filters support all actions supported by TCAM filters. A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).
Sponsored by: Chelsio Communications
With r333218 it is now possible for drivers to use an sx lock and thus sleep while
waiting on long running operations rather than DELAY().
Reported by: gallatin
Reviewed by: sbruno
Approved by: sbruno
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14984
The properties 'assigned-clocks', 'assigned-clock-parents' and
'assigned-clock-rates' all work together.
'assigned-clocks' holds the list of clock for which we need to either
assign a new parent or a new frequency.
The old code just supported assigning a new parents, add support for
assigning a new frequency too.
These firmwares and the following list of changes are from the public
ChelsioUwire-3.7.1.0 release.
T6 Firmware
================================================================================
Version : 1.19.1.0
Date : 04/23/2018
================================================================================
Fixes
-----
BASE:
- Fixed traffic stall when rate-limit is modified while running traffic.
- Fixes a firmware crash in FW_ETH_TX_EO_WR handling.
- Fixes host DCB support when FW_PORT_CMD is used.
ETH:
- Exit Auto-Negotiation if we don't receive base page from peer within 10s.
This fixes some cases where in we keep on restarting auto negotiation without
ever exiting, resulting in link failure.
- Fixes an issue where VF packets counter were not increasing if VF packets
coalesced WR is used by driver.
OFLD:
- Kernel and user mode NVMEoF performance enhancements.
FOiSCSI:
- Fixes fw crash when trying to connect to non-existence IPv6 iSNS target.
================================================================================
Version : 1.18.9.0
Date : 03/27/2018
================================================================================
Fixes
-----
BASE:
- For Ethernet frames less than 64B, pad them with zero bytes as per IEEE spec
(RFC 894).
- Added a new parameter iqtype to FW_IQ_CMD to identify the ingress NIC or offload
queues. This fixes an issue where driver was receiving interrupt with no new
messages in queue.
- FW_PARAMS_CMD processes all the valaid paramaters and returns value 0UL for
any unknown parameter.
OFLD:
- Fixes connection failure during SRQ reuse.
- Fixes incorrect cqe in case of WRITE with immediate operation.
FOiSCSI:
- Fixes a fw crash when wrong node-id is passed to FW_FOISCSI_CTRL_WR.
FOFCoE:
- Fixes a fw hang while creating NPIV.
Enhancements
------------
ETH:
- A new WR FW_ETH_TX_PKTS_VM_WR added to support VM packet coalescing.
================================================================================
Version : 1.18.4.0
Date : 02/28/2018
================================================================================
Fixes
-----
BASE:
- Fixed Rate limiting not working for 101Mbps<=rate limit<=163Mbps range.
- Fixed starting more than 32 VMs on PF4 causing firmware hang.
ETH:
- Fixed link failure due to FEC mismatch with optics.
- Fixed link failure with link toggle stress tests.
- Only BaseR FEC is supported for 50G.
- Fixed a bug in next page handling which sometimes causes link down.
- Fixed port down due to failre to read eeprom contents of some modules.
- Fixed a bug causing adapter to fail with spider configuration.
FOiSCSI:
- Fixed a bug causing login failure when connecting to multiple targets.
Enhancements
------------
BASE:
- Added a new firmware API to retrieve the maximum temperaturethreshold for
the chip (FW_PARAM_DEV_DIAG_MAXTMPTHRESH).
ETH:
- Added support for user to contol pause negotiation during auto negotiation.
FOiSCSI:
- Added a new facility to redirect few fw events to offload rx queue
(based on driver's configration)
- Driver can ignore providing ipv6 prefix len during ipv6 address configuration.
================================================================================
Version : 1.17.14.0
Date : 12/27/2017
================================================================================
FIXES
-----
BASE:
- Fixed an FLR failure during simulteneous power up of VM.
- Fixed an issue in vlan acl which was limiting vlan range to 1024.
ETH:
- Enabled RS-FEC for 25G active copper cable and 25GBASE-SR.
- When auto negotiation is enabled, final pause settings are resolved
based on local and peer pause settings.
- Handle NACK for an I2C access.
OFLD
- Fixed rdma connection cleanup in SO adpater.
- Fixed rdma connections during read invalidate.
- Fixed the crash when invalid BW rate is passed to fw.
- Fixed the traffic hang when BW allocation is changed from switch during traffic.
FOFCoE:
- Fixed an issue where initiator remains logged-in even after LLDP is disabled
on switch.
ENHANCEMENTS
------------
BASE:
- Added support for 248 VFs.
- Added fw driver periodic calibration for MC.
ETH:
- Added XLAUI port type support.
- Added raw mac entry deletion support (FW_VI_MAC_ID_BASED_FREE).
OFLD:
- Inline IPSec support added (flag F_FW_ULPTX_WR_DATA indicates the inline
IPSec WR).
- New work request FW_RI_RDMA_WRITE_CMPL_WR (write with completion) added to
T5 Firmware
================================================================================
Version : 1.19.1.0
Date : 04/23/2018
================================================================================
Fixes
-----
BASE:
- Fixes a firmware crash in FW_ETH_TX_EO_WR handling.
- Fixes host DCB support when FW_PORT_CMD is used.
ETH:
- Fixes an issue where VF packets counter were not increasing if VF packets
coalesced WR is used by driver.
OFLD:
- Fixes an issue where fw hangs if max traffic rate passed is 0.
FOiSCSI:
- Fixes fw crash when trying to connect to non-existence IPv6 iSNS target.
================================================================================
Version : 1.18.9.0
Date : 03/27/2018
================================================================================
Fixes
-----
BASE:
- For Ethernet frames less than 64B, pad them with zero bytes as per IEEE spec
(RFC 894).
- Added a new parameter iqtype to FW_IQ_CMD to identify the ingress NIC or offload
queues. This fixes an issue where driver was receiving interrupt with no new
messages in queue.
ETH:
- Pad the Ethernet packets of size less than 64B with zeros. This fixes the
incorrect checksum generation of packets less then 64B.
FOiSCSI:
- Fixes a fw crash when wrong node-id is passed to FW_FOISCSI_CTRL_WR.
FOFCoE:
- Fixes a fw hang while creating NPIV.
Enhancements
------------
ETH:
- A new WR FW_ETH_TX_PKTS_VM_WR added to support VM packet coalescing.
================================================================================
Version : 1.18.4.0
Date : 02/28/2018
================================================================================
Fixes
-----
BASE:
- Fixed starting more than 32 VMs on PF4 causing firmware hang.
FOiSCSI:
- Fixed a bug causing login failure when connecting to multiple targets.
Enhancements
------------
BASE:
- Added a new firmware API to retrieve the maximum temperaturethreshold for
the chip (FW_PARAM_DEV_DIAG_MAXTMPTHRESH).
ETH:
- Added support for user to contol pause negotiation during auto negotiation.
FOiSCSI:
- Added a new facility to redirect few fw events to offload rx queue
(based on driver's configration)
- Driver can ignore providing ipv6 prefix len during ipv6 address configuration.
================================================================================
Version : 1.17.14.0
Date : 12/27/2017
================================================================================
FIXES
-----
BASE:
- Fixed an issue in vlan acl which was limiting vlan range to 1024.
ETH:
- Corrected lane inversion logic.
- Fixed improper LED behavior in T580 cards.
- When auto negotiation is enabled, final pause settings are resolved
based on local and peer pause settings.
- Handle NACK for an I2C access.
OFLD
- Fixed rdma connections during read invalidate.
FOiSCSI:
- Fixed a connections hang when link is toggled frequently.
FOFCoE:
- Fixed an issue where initiator remains logged-in even after LLDP is disabled
on switch.
ENHANCEMENTS
------------
BASE:
- Added support for 124 VFs.
ETH:
- Added XLAUI port type support.
- Added raw mac entry deletion support (FW_VI_MAC_ID_BASED_FREE).
OFLD:
- New work request FW_RI_RDMA_WRITE_CMPL_WR (write with completion) added to
optimize NVMEoF write.
T4 Firmware
================================================================================
Version : 1.19.1.0
Date : 04/23/2018
================================================================================
Fixes
-----
BASE:
- Fixes a firmware crash in FW_ETH_TX_EO_WR handling.
- Fixes host DCB support when FW_PORT_CMD is used.
FOiSCSI:
- Fixes fw crash when trying to connect to non-existence IPv6 iSNS target.
================================================================================
Version : 1.18.9.0
Date : 03/27/2018
================================================================================
Fixes
-----
BASE:
- Added a new paramter iqtype to FW_IQ_CMD to identify the ingress NIC or
offload queues. This fixes an issue where driver was receiving interrupt with
no new messages in queue.
FOFCoE:
- Fixes a fw hang while creating NPIV.
Enhancements
------------
ETH:
- A new WR FW_ETH_TX_PKTS_VM_WR added to support VM packet coalescing.
================================================================================
Version : 1.18.4.0
Date : 02/28/2018
================================================================================
Enhancements
------------
BASE:
- Added a new firmware API to retrieve the maximum temperaturethreshold for
the chip (FW_PARAM_DEV_DIAG_MAXTMPTHRESH).
================================================================================
Version : 1.17.14.0
Date : 12/27/2017
================================================================================
FIXES
-----
BASE:
- Fixed an issue in vlan acl which was limiting vlan range to 1024.
MFC after: 3 days
Sponsored by: Chelsio Communications
Originally, I overlooked that PMIO register 0xc0 has a dual personality.
It can either be S5/Reset Status register or Misc. Fix register (aka
debug status register). The mode is controlled by bit 2 in PMIO
register 0xc4. Apparently there are register programming requirements
for the second personality, so many BIOSes leave the register, after
programming it, in that mode. So, we need to switch the register to the
correct mode.
Additionally, AMDSB8_WD_RST_STS was defined incorrectly as bit 13 while
it is actually bit 25 (and the register's width is 32 bits, not 16).
With this change I see the following in dmesg after a reset by the
watchdog:
amdsbwd0: ResetStatus = 0x42000000
amdsbwd0: Previous Reset was caused by Watchdog
MFC after: 2 weeks
GPUs often have a VGA PCI class code and are probed/attached
by the VGA driver. Allow them to be detached so they can
be presented as passthru devices to VM guests.
Submitted by: mmacy
Reviewed by: jhb, imp, rgrimes
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D15269
Without the suspend method the watchdog may fire in S1 state. Without
the resume method the watchdog is not re-enabled after returning from S3
state. I observe this on one of my systems.
Not sure if watchdog(4) should participate in the suspend actions.
Right now everything is up to individual drivers.
MFC after: 2 weeks
According to the 802.1Q-2014 9.6 VLAN Tag Control Information, VID value 0
means that there is no VLAN tag assigned to the packet, and only PCP and
DEI values from the tag are meaningful. Current flow table programming
filter out such packets.
When programming VLAN filter for flow table, unconditionally add rule which
accept packets with VLAN id 0. The packets are already handled correctly
by the network stack.
Reviewed by: hselasky, slavash
Sponsored by: Mellanox Technologies
MFC after: 1 week
Admin pass-through requests took controller lock before the queue lock,
but in case of request submission to a failed controller controller lock
was taken after the queue lock. Fix that by reducing the lock scopes and
switching to mtx_pool locks to track pass-through request completion.
Sponsored by: iXsystems, Inc.
This driver was for an early and uncommon legacy PCI 10GbE for a single
ASIC, Intel 82597EX. Intel quickly shifted to the long lived ixgbe family.
Submitted by: kbowling
Reviewed by: brooks imp jeffrey.e.pieper@intel.com
Relnotes: yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15234
Current interface to the gntdev in FreeBSD is wrong, and mostly worked
out of luck before the PTI FreeBSD fixes, when kernel and user-space
where sharing the same page tables.
On FreeBSD ioctls have the size of the passed struct encoded in the
ioctl number, because the generic ioctl handler in the OS takes care
of copying the data from user-space to kernel space, and then calls
the device specific ioctl handler. Thus using ioctl structs with
variable sizes is not possible.
The fix is to turn the array of structs at the end of
ioctl_gntdev_alloc_gref and ioctl_gntdev_map_grant_ref into pointers,
that can be properly accessed from the kernel gntdev driver using the
copyin/copyout functions. Note that this is exactly how it's done for
the privcmd driver.
Sponsored by: Citrix Systems R&D
Refresh upstream driver before impending conversion to iflib.
Major changes:
- Support for descriptor writeback mode (required by ixlv(4) for AVF support)
- Ability to disable firmware LLDP agent by user (PR 221530)
- Fix for TX queue hang when using TSO (PR 221919)
- Separate descriptor ring sizes for TX and RX rings
PR: 221530, 221919
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: #IntelNetworking
MFC after: 1 day
Relnotes: Yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D14985
This driver supports legacy, 32-bit PCI devices, and had an ambiguous
license. Supported devices were already reported to be rare in 2003
(when an earlier version of the driver was removed in r123201).
Reviewed by: rgrimes
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15245
Filter work requests are submitted in the nexus cdev's ioctl which then
blocks waiting for a reply. If driver detach runs in this state and
disables interrupts the ioctl will never complete and detach will hang
in destroy_cdev.
Sponsored by: Chelsio Communications
Move the allwinner early printf support to the snps driver as it
should work with all implementation.
While here add instruction for enabling it on 64bits SoCs.
This is a prequisite before we remove the driver from -current.
Reviewed by: emaste kbowling imp
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D15244
This change allows clean device detach on attach failures and driver unload,
while previous code tried to talk to already shut down controller, or even
accessed resources failed to allocate.
Sponsored by: iXsystems, Inc.
Reserve 3b in the 14b atid to identify the owner and use it to dispatch
the CPL. This allows all CPLs that use an atid to be used as shared
CPLs, although ACT_OPEN_RPL is the only one being converted in this
revision.
Sponsored by: Chelsio Communications
dma_tag_payload should not be destroyed before payload_dma_map, and seems
it should be used there instead of dma_tag to match creation.
Sponsored by: iXsystems, Inc.
intended recipient of a CPL when it can't be determined solely from the
opcode. Retire the per-queue handlers for such CPLs in favor of the new
scheme.
Sponsored by: Chelsio Communications
through a SYSCTL instead of a compile time define.
Add quirk by default for all LynxPoint XHCI controllers.
PR: 227602
MFC after: 3 days
Sponsored by: Mellanox Technologies
Per the datasheet, the BUSY bit must be set when reading or writing PHY
registers. From Linux commit 80928805babf.
Submitted by: Arshan Khanifar
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15217
In smsc_phy_init function, when the driver was trying to reset PHY, it
didn't poll for the correct bit (BMCR_RESET) to be cleared. Instead, it
anded it with MII_BMCR (which is 0), so it just exited the loop.
This issue was fixed in Linux in commit d94609200069.
Submitted by: Arshan Khanifar
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Previously, get_keyid() was returning the address of the receive key
instead of the transmit key when renegotiating the transmit key. This
could either hang the card (if a connection was only offloading TLS TX
and thus had a receive key address of -1) or cause the connection to
fail by overwriting the wrong key (if both RX and TX TLS were
offloaded).
Submitted by: Harsh Jain @ Chelsio
Sponsored by: Chelsio Communications
- Microsemi SCSI driver for PQI controllers.
- Found on newer model HP servers.
- Restrict to AMD64 only as per developer request.
The driver provides support for the new generation of PQI controllers
from Microsemi. This driver is the first SCSI driver to implement the PQI
queuing model and it will replace the aacraid driver for Adaptec Series 9
controllers. HARDWARE Controllers supported by the driver include:
HPE Gen10 Smart Array Controller Family
OEM Controllers based on the Microsemi Chipset.
Submitted by: deepak.ukey@microsemi.com
Relnotes: yes
Sponsored by: Microsemi
Differential Revision: https://reviews.freebsd.org/D14514
added/deleted, the delete the multicast addresses previously programmed
in HW and reprogram the new set of multicast addresses.
Submitted by:Vaishali.Kulkarni@cavium.com
MFC after:5 days
device-side (and only device-side) "virtual USB serial adapters" - the
ones you can get with an OTG-capable board - as consoles. It boils down
to adding the device name to kern.console sysctl, although doing that
requires jumping through some hoops. It doesn't change the actual
operation of those virtual devices. The point is to make it possible
for init(8) to recognize them as console devices and to launch getty(8)
for them, when configured as "onifconsole" in ttys(5). The point of
that, in turn, is to add such entries to the default ttys(5), so that
init(8) will launch gettys for device-side "virtual serial adapters",
but not for actual USB serial dongles.
Reviewed by: hselasky@
No objections: imp@
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
We intend to remove support before FreeBSD 12 is branched. These are
available only as 32-bit PCI devices. The driver has an ambiguous
license and I have not been successful in contacting the driver's author
in order to address this.
The planned deprecation has been announced on -current and -stable; if
we receive feedback that the driver is still useful and we are able to
resolve the license issue this deprecation notice can be reverted.
Reviewed by: bapt, brooks, imp, rgrimes
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15182
This sysctl allows a deeper dive into the sleep abyss comparing to
debug.acpi.suspend_bounce. When the new sysctl is set the system will
execute the suspend sequence up to the call to AcpiEnterSleepState().
That includes saving processor contexts and parking APs. Then, instead
of actually entering the sleep state, the BSP will call resumectx() to
emulate the wakeup. The APs should get restarted by the sequence of
Init and Startup IPIs that BSP sends to them.
MFC after: 8 days
Intel® Arria® 10 SoC.
Cadence Quad SPI Flash is not generic SPI controller, but SPI flash
controller, so don't use spibus here, instead provide quad spi flash
interface.
Since it is not on spibus, then mx25l flash device driver is not usable
here, so provide new n25q flash device driver with quad spi flash
interface.
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D10245
When disabling regulator when they are unused, check before is they are
enabled.
While here don't check the enable_cnt on the regulator entry as it is
checked by regnode_stop.
This solve the panic on any board using a fixed regulator that is driven
by a gpio when the regulator is unused.
Tested On: OrangePi One
Pointy Hat to: myself
Reported by: kevans, Milan Obuch (freebsd-arm@dino.sk)
Retrieve the tag from the correct ifnet and use the provided tag
(instead of hardcoded 0xffff, implying no tag) in the routines that
process offload policy.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications
that cross this boundary, they perform better when this isn't the
case. Intel uses the 3rd byte in the vendor specific area for
this. The DC P3500 was previously listed without any explanation. Add
the DC P3520 and DC P4500 to the list.
There won't be any others drives needing this quirk. Intel has
standardized a field in the namespace data in 1.3 (noiob). A future
patch will use that if it exists, with fallback to this method.
Submitted by: Keith Busch
Reviewed by: jimharris@
ACPI I/O port descriptors use _MIN and _MAX fields to specify the set
of allowable base (start) addresses for an I/O port resource along with
a _LEN field specifying the length. A fixed resource is supposed to be
encoded with _MIN == _MAX, but some buggy firmwares instead set _MAX to
the end of the fixed range. Relocating I/O ranges only make sense in
_PRS (possible resource settings), not in _CRS (current resource settings),
so if an I/O port range with _MAX set set to the end of the range is
present in _CRS, treat it as a fixed I/O port resource starting at
_MIN.
PR: 224096
Submitted by: Harald Böhm <harald@boehm.codes>
Pointy hat to: jhb (taking so long to actually commit this)
MFC after: 1 week
When ixgbe was converted to iflib, it lost the SIOCGI2C support
that allows ifconfig to print SFP state, optical light levels, etc.
Restore this by plugging in to the ifdi_i2c_req iflib method. Note
that the sanity checking on dev_addr that used to be done in ixgbe is
now done in iflib.
Reviewed by: erj, Matthew Macy <mmacy@mattmacy.io>
Sponsored by: Netflix
Adjust sys/conf/files and sys/modules/puc/Makefile to omit
pucdata.c now tht it's included by puc_pci.c.
Submitted by: Lakhan Shiva Kamireddy (with build fixes by me)
Pull Request: https://github.com/freebsd/freebsd/pull/136
COP allows fine-grained control on whether to offload a TCP connection
using t4_tom, and what settings to apply to a connection selected for
offload. t4_tom must still be loaded and IFCAP_TOE must still be
enabled for full TCP offload to take place on an interface. The
difference is that IFCAP_TOE used to be the only knob and would enable
TOE for all new connections on the inteface, but now the driver will
also consult the COP, if any, before offloading to the hardware TOE.
A policy is a plain text file with any number of rules, one per line.
Each rule has a "match" part consisting of a socket-type (L = listen,
A = active open, P = passive open, D = don't care) and a pcap-filter(7)
expression, and a "settings" part that specifies whether to offload the
connection or not and the parameters to use if so. The general format
of a rule is: [socket-type] expr => settings
Example. See cxgbetool(8) for more information.
[L] ip && port http => offload
[L] port 443 => !offload
[L] port ssh => offload
[P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno
[P] dst port ssh => offload !nagle ecn cong tahoe
[P] dst port http => offload
[A] dst port 443 => offload tls
[A] dst net 192.168/16 => offload !timestamp cong highspeed
The driver processes the rules for each new listen, active open, or
passive open and stops at the first match. There is an implicit rule at
the end of every policy that prohibits offload when no rule in the
policy matches:
[D] all => !offload
This is a reworked and expanded version of a patch submitted by
Krishnamraju Eraparaju @ Chelsio.
Sponsored by: Chelsio Communications
While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.
Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.
PR: 182297
Reviewed by: jhibbits, vangyzen
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15057
The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.
By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap. The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.
There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.
copyout(9) was rewritten to use vm_fault_quick_hold(). An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls. The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline. If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.
The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done. The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging. I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.
Tested by: pho
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D14633
xdma(4) interface.
This allows us to switch between Altera mSGDMA or SoftDMA engines used by
atse(4) device.
This also makes atse(4) driver become 25% smaller.
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9618
SoftDMA is a software implementation of DMA engine built using Altera
FIFO component.
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9620
"Terminus BSD Console" is a derivative of Terminus that is provided
by Mr. Dimitar Zhekov under the 2-clause BSD license for use by the
FreeBSD vt(4) console and other BSDs.
PR: 227409
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Changelist:
- Turn tx_rings and rx_rings arrays into arrays of pointers to kring
structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib,
vtnet and ptnet drivers to cope with the change.
- Generalize the nm_config() callback to accept a struct containing many
parameters.
- Introduce NKR_FAKERING to support buffers sharing (used for netmap
pipes)
- Improved API for external VALE modules.
- Various bug fixes and improvements to the netmap memory allocator,
including support for externally (userspace) allocated memory.
- Refactoring of netmap pipes: now linked rings share the same netmap
buffers, with a separate set of kring pointers (rhead, rcur, rtail).
Buffer swapping does not need to happen anymore.
- Large refactoring of the control API towards an extensible solution;
the goal is to allow the addition of more commands and extension of
existing ones (with new options) without the need of hacks or the
risk of running out of configuration space.
A new NIOCCTRL ioctl has been added to handle all the requests of the
new control API, which cover all the functionalities so far supported.
The netmap API bumps from 11 to 12 with this patch. Full backward
compatibility is provided for the old control command (NIOCREGIF), by
means of a new netmap_legacy module. Many parts of the old netmap.h
header has now been moved to netmap_legacy.h (included by netmap.h).
Approved by: hrs (mentor)
In UTF-8 locales mandoc uses a number of characters outside of the Basic
Latin group, e.g. from general punctuation or miscellaneous mathematical
symbols, and these rendered as ? in text mode.
This change adds (char, replacement, code point, description):
– - U+2013 En Dash
⟨ < U+27E8 Mathematical Left Angle Bracket
⟩ > U+27E9 Mathematical Right Angle Bracket
This change addresses some common cases; there are others that still
need to be added after a more thorough review.
PR: 227409
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).
Reviewed by: kib, jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15017
CAM_SEL_TIMEOUT was introduced in
https://reviews.freebsd.org/D7521 (r304251), which claimed:
"VM shall response to CAM layer with CAM_SEL_TIMEOUT to filter those
invalid LUNs. Never use CAM_DEV_NOT_THERE which will block LUN scan
for LUN number higher than 7."
But it turns out this is not correct:
I think what really filters the invalid LUNs in r304251 is that:
before r304251, we could set the CAM_REQ_CMP without checking
vm_srb->srb_status at all:
ccb->ccb_h.status |= CAM_REQ_CMP.
r304251 checks vm_srb->srb_status and sets ccb->ccb_h.status properly,
so the invalid LUNs are filtered.
I changed my code version to r304251 but replaced the CAM_SEL_TIMEOUT
with CAM_DEV_NOT_THERE, and I confirmed the invalid LUNs can also be
filtered, and I successfully hot-added and hot-removed 8 disks to/from
the VM without any issue.
CAM_SEL_TIMEOUT has an unwanted side effect -- see cam_periph_error():
For a selection timeout, we consider all of the LUNs on
the target to be gone. If the status is CAM_DEV_NOT_THERE,
then we only get rid of the device(s) specified by the
path in the original CCB.
This means: for a VM with a valid LUN on 3:0:0:0, when the VM inquires
3:0:0:1 and the host reports 3:0:0:1 doesn't exist and storvsc returns
CAM_SEL_TIMEOUT to the CAM layer, CAM will detech 3:0:0:0 as well: this
is the bug I reported recently:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226583
PR: 226583
Reviewed by: mav
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D14690
Some devices cannot rely on the switch MDIO address passed in the DTB
for specifying single/multi-chip addressing mode. Introduce new property
"single-chip-addressing" which added to DTS will force single-chip mode.
Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D14800
Linux device tree binding, whose usage is obligatory,
comprises faulty representation of Marvell cryptographic
engine (CESA) - two engines are artificially gathered into
single DT node, in order to avoid certain SW limitation.
This patch improves the cesa driver to support above binding,
depending on compatible string, which helps to ensure
backward compatibility.
Submitted by: Patryk Duda
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D14760
The code now has a single, consistant flow for all three ioctl
variants. ifdefs and for pre-FreeBSD-7 compatability are moved to
functions and macros. So the flow is alwasy the same, we impose
the cost of allocating, copying to, updating from, and freeing a
copy of struct pci_conf_io on all paths.
This change will allow PCIOCGETCONF32 support currently in
sys/compat/freebsd32/freebsd32_ioctl.c to be moved here.
Reviewed by: kib, jhb
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14978
Change OF_getencprop_alloc semantics to be combination of malloc and
OF_getencprop and return size of the property, not number of elements
allocated.
For the use cases where number of elements is preferred introduce
OF_getencprop_alloc_multi helper function that copies semantics
of OF_getencprop_alloc prior to this change.
This is to make OF_getencprop_alloc and OF_getencprop_alloc_multi
function signatures consistent with OF_getencprop_alloc and
OF_getencprop_alloc_multi.
Functionality-wise this patch is mostly rename of OF_getencprop_alloc
to OF_getencprop_alloc_multi except two calls in ofw_bus_setup_iinfo
where 1 was used as a block size.
Changelist:
- remove unused nkr_slot_flags
- new nm_intr adapter callback to enable/disable interrupts
- remove unused sysctls and document the other sysctls
- new infrastructure to support NS_MOREFRAG for NIC ports
- support for external memory allocator (for now linux-only),
including linux-specific changes in common headers
- optimizations within netmap pipes datapath
- improvements on VALE control API
- new nm_parse() helper function in netmap_user.h
- various bug fixes and code clean up
Approved by: hrs (mentor)
OF_getprop_alloc takes element size argument and returns number of
elements in the property. There are valid use cases for such behavior
but mostly API consumers pass 1 as element size to get string
properties. What API users would expect from OF_getprop_alloc is to be
a combination of malloc + OF_getprop with the same semantic of return
value. This patch modifies API signature to match these expectations.
For the valid use cases with element size != 1 and to reduce
modification scope new OF_getprop_alloc_multi function has been
introduced that behaves the same way OF_getprop_alloc behaved prior to
this patch.
Reviewed by: ian, manu
Differential Revision: https://reviews.freebsd.org/D14850
and VMware legal:
- Add a dual BSD-2 Clause/GPLv2 LICENSE file in the VMCI directory
- Remove the use of "All Rights Reserved"
- Per best practice, remove copyright/license info from Makefile
Reviewed by: imp, emaste, jhb, Vishnu Dasa <vdasa@vmware.com>
Approved by: VMware legal via Mark Peek <markpeek@vmware.com>
Differential Revision: https://reviews.freebsd.org/D14979
opposed to one that accidentally worked on the one arch I test-compiled for
on my first try).
Reported by: np@, O. Hartmann <ohartmann@walstatt.org>
Pointy hat: ian@
frequency ioctl actually set the corresponding ivar instead of just storing
the value locally in the softc (and then not using it for anything). Also,
return the correct error code if the ioctl cmd is not recognized.
FDT-based systems, and instead add proper FDT probe code. Because this
driver is freebsd-specific and just provides generic userland access to run
spibus transactions, there is no bindings document to mandate a compatible
string, so just arbitrarily use "freebsd,spigen".
implementation that must be used, it's just the base system default driver.
Also add a comment noting that we're being more liberal about the bus
frequency property than the dts binding documents require.
- Change the description string to "SPI bus" (was "spibus bus").
- This is the default driver for a SPI bus, not a generic implementation,
so return the probe value that indicates such.
- Use device_delete_children() at detach time, instead of a local loop
to enumerate the children and detach each one individually.
and phase) and the maximum bus speed can be changed. The chip select
number cannot be changed, because the device instances which are children
of spibus are inherently associated with the chip select number they were
instantiated for.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
enough rate, the IPMI code can print large numbers of messages to the
console, such as:
ipmi0: KCS: Failed to read completion code
ipmi0: KCS error: ff
ipmi0: KCS: Failed to read completion code
ipmi0: KCS error: ff
These seem to be innocuous from a system standpoint, and the user-
space code can deal with the failures. Therefore, suppress printing
these messages to the console unless bootverbose is enabled.
Obtained from: Netflix, Inc.
IFF_UP and IFF_DRV_RUNNING out of sync. ifhwioctl in the kernel pays no
attention to the return code from the driver ioctl during SIOCSIFFLAGS
so these messages are the only indication that the ioctl was called but
failed.
MFC after: 1 week
Sponsored by: Chelsio Communications
The change upgrades the driver to use the split Communication Status
Block (CSB) format. In this way the variables written by the guest
and read by the host are allocated in a different cacheline than
the variables written by the host and read by the guest; this is
needed to avoid cache thrashing.
Approved by: hrs (mentor)
According to device tree binding 'assigned-addresses' can refer to PCIE MMIO
register space. New function ofw_bus_assigned_addresses_to_rl is
provided to support it.
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D14750
bootloaders such as u-boot might enable regulators, or simply regulators could
be enabled by default by the PMIC, even if we don't have a driver for
the device or subsystem.
Disable unused regulators just before going to userland.
A tunable hw.regulator.disable_unused is added to not disable them in case
this causes problems on some board but the default behavior is to disable
everything unused.
I prefer to break thinks now and fix them rather than never switch to the
case were we disable regulators.
Tested on : Pine64-LTS (an idle board goes from ~0.33A to ~0.27A)
Tested on : BananaPi M2
Differential Revision: https://reviews.freebsd.org/D14781
Invalid font data passed to PIO_VFONT can result in an integer overflow
in glyphsize. Characters may then be drawn on the console using glyph
map entries that point beyond the end of allocated glyph memory,
resulting in a kernel memory disclosure.
Submitted by: emaste
Reported by: Dr. Silvio Cesare of InfoSect
Security: CVE-2018-6917
Security: FreeBSD-SA-18:04.vt
Sponsored by: The FreeBSD Foundation
Two modules with the same name cannot be loaded, so Marvell specific drivers
cannot have the same name as the generic drivers.
Files with the same name, even in different folders overlaps their .o files,
so in order to prepare for supporting Marvell platforms in GENERIC armv7
config, modify conflicting names.
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D14743
The TCB is read using a memory window right now. A better alternate to
get self-consistent, uncached information would be to use a GET_TCB
request but waiting for a reply from hw while holding non-sleepable
locks is quite inconvenient.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14817
We intend to remove support before FreeBSD 12 is branched.
Reviewed by: imp, emaste
MFC after: 3 days
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14890
The common pratice in ethernet device drivers is to fall back to
ether_ioctl() to implement generic ioctls not implemented by the driver
and to fail if no handler exists.
Convert these drivers to follow that practice rather than calling
ether_ioctl() for specific cases.
vxge(4) aready had the default case, but it was only called on failure
to match.
Reviewed by: imp
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14895
Make sure the command completion handler is not called when the device is
in internal error state. This can easily trigger use after free situations.
MFC after: 3 days
Sponsored by: Mellanox Technologies
During health care IRQ resources will be reallocated.
Newbus requires that Giant is locked before accessing
these resources.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Firmware dump collecting should be triggered in case firmware syndrome
with request for reset bit is set.
MFC after: 3 days
Submitted by: slavash@
Sponsored by: Mellanox Technologies
- Move the semaphore locking and unlocking to the same function.
- Flags are no longer needed if the reset and crdump will be done in the
same function.
MFC after: 3 days
Submitted by: slavash@
Sponsored by: Mellanox Technologies
- Move firmware dump prep and cleanup to init_one() and remove_one() so that
the init and cleanup will happen only upon driver reload.
- Add some prints to indicate firmware dump.
MFC after: 3 days
Submitted by: slavash@
Sponsored by: Mellanox Technologies
The old code checked for MLX5_CR_SPACE_DOMAIN which is irrelevant here.
However, if dev->vsec_addr would be 0, an access to wrong offset would
happen.
MFC after: 3 days
Submitted by: slavash@
Sponsored by: Mellanox Technologies
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900
These objects are defined earlier in the same file; an extern declaration
after definition is redundant.
Broken in r331766 (introduction of ocs_fc(4)).
Sponsored by: Dell EMC Isilon
The ocs_fc(4) driver supports the following hardware:
Emulex 16/8G FC GEN 5 HBAS
LPe15004 FC Host Bus Adapters
LPe160XX FC Host Bus Adapters
Emulex 32/16G FC GEN 6 HBAS
LPe3100X FC Host Bus Adapters
LPe3200X FC Host Bus Adapters
The driver supports target and initiator mode, and also supports FC-Tape.
Note that the driver only currently works on little endian platforms. It
is only included in the module build for amd64 and i386, and in GENERIC
on amd64 only.
Submitted by: Ram Kishore Vegesna <ram.vegesna@broadcom.com>
Reviewed by: mav
MFC after: 5 days
Relnotes: yes
Sponsored by: Broadcom
Differential Revision: https://reviews.freebsd.org/D11423
When de(4) was imported in 1997 the world was not ready for these ioctls.
In over 20 years that hasn't changed so it seems safe to assume their
time will never come.
Reviewed by: imp, jhb
Approved by: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14889
A series of zero delay callouts can happen causing high CPU usage of the
timer subsystem when trying to repeat keys, because the time of the
absolute timeout is not moving forward. The condition clears when all
keys are released.
Reported by: Johannes Lundberg <johalun0@gmail.com>
Discussed with: bde@
PR: 226968
MFC after: 1 week
Sponsored by: Mellanox Technologies
Update to what the previous code seemed to be doing via the correct
interfaces. Further issues exist in xge_ioctl_registers(), but this is
debugging code in a driver that has few users and they don't appear to
be crashes or leaks.
Reviewed by: jhb (prior version)
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14848
No functional change in practice. If the sbni driver supported
64-bit big-endian system, this would be an ABI changes, but it is
i386-only. The old version leaked a word of stack on 64-bit systems.
This eliminates the only assignment to ifr_data.
Reviewed by: kib
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14874
No functional change in practice. If the sbni driver supported
64-bit big-endian system, this would be an ABI changes, but it is
i386-only. The old version leaked a word of stack on 64-bit systems.
This eliminates the only assignment to ifr_data.
Requests to modify the state of TLS connections need to be sent on the
same queue as TLS record transmit requests to ensure ordering.
However, in order to use the offload transmit queue in t4_set_tcb_field(),
the function needs to be updated to do proper flow control / credit
management when queueing a request to an offload queue. This required
passing a pointer to the toepcb itself to this function, so while here
remove the 'tid' and 'iqid' parameters and obtain those values from the
toepcb in t4_set_tcb_field() itself.
Submitted by: Harsh Jain @ Chelsio (original version)
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14871
It is random collection of fixes for issues not yet corrected,
reported at https://tsyrklevi.ch/clang_analyzer/freebsd_013017/. Many
issues from that list were already corrected. Most of them are for
compat32, old compat32 or affect both primary host ABI and compat32.
The freebsd32_kldstat(), for instance, was already fixed by using
malloc(M_ZERO). Patch includes correction to report the supplied
version back, which is just pedantic.
Reviewed by: brooks, emaste (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14868
This is more correct in that ioctl commands have no meaning until they
hit the handler associated with the file descriptor.
Add support for MDIOCRESIZE_32 which was missed when it was added.
Reviewed by: cem, kib, markj (various versions)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14714
Include _uio.h instead of uio.h in several headers to reduce header
polution.
Fix a few places that relied on header polution to get the uio.h header.
I have not moved struct uio as many more things that use it rely on
header polution to get other definitions from uio.h.
Reviewed by: cem, kib, markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14811
To fix the GCC build, remove multiple redundant declarations of
vmci_send_datagram() (the copy in vmci.h as well as the extern definition in
vmci_queue_pair.c were wholly redundant).
Also to fix the GCC build, include a non-empty format string in the vmci(4)
definition of ASSERT(). It seems harmless either way, but adding the
stringified invariant is easier than masking the warning.
The other vmci_kernel_defs.h changes are cosmetic and simply match macros to
existing definitions.
Reported by: GCC 6.4.0
Sponsored by: Dell EMC Isilon
In a virtual machine, VMCI is exposed as a regular PCI device. The primary
communication mechanisms supported are a point-to-point bidirectional
transport based on a pair of memory-mapped queues, and asynchronous
notifications in the form of datagrams and doorbells. These features are
available to kernel level components such as vSockets through the VMCI
kernel API. In addition to this, the VMCI kernel API provides support for
receiving events related to the state of the VMCI communication channels,
and the virtual machine itself.
Submitted by: Vishnu Dasa <vdasa@vmware.com>
Reviewed by: bcr, imp
Obtained from: VMware
Differential Revision: https://reviews.freebsd.org/D14289
controlled by the TCP_BLACKBOX option.
Enable this as part of amd64 GENERIC. For now, leave it disabled on
other platforms.
Sponsored by: Netflix, Inc.
This fixes an avoidable EINVAL when the user tries to disable AN after
the port is initialized but l1cfg doesn't have a valid speed to use.
MFC after: 1 week
Sponsored by: Chelsio Communications
error state.
If the device is in internal error state the hardware will not
generate completions. Just move on to destroy the resources.
Submitted by: slavash@
MFC after: 1 week
Sponsored by: Mellanox Technologies
Change page cleanup flow when in internal error to properly decrement
the page counts when reclaiming pages. That prevents timing out
waiting for extra pages that were actually cleaned up previously.
Submitted by: slavash@
MFC after: 1 week
Sponsored by: Mellanox Technologies
When a PCI error is detected the PCI state could be corrupt, don't
save it in that flow. Save the state after initialization. After
restoring the PCI state during slot reset save it again, restoring
the state destroys the previously saved state info.
Submitted by: slavash@
MFC after: 1 week
Sponsored by: Mellanox Technologies
Since the FW can be shared between PCI functions it is common that
more than one health poll will detected a failure, this can lead to
multiple resets.
The solution is to use a FW locking mechanism using semaphore space to
provide a way to synchronize between functions. The FW semaphore is
acquired via config cycle access. First the VSEC gateway must be
acquired, then the semaphore can be locked by writing a value to it
and confirmed it's locked by reading the same value back. The process
in the same to free the semaphore, except the value written should be
zero.
Submitted by: slavash@
MFC after: 1 week
Sponsored by: Mellanox Technologies
If a FW assert is considered fatal, indicated by a new bit in the
health buffer, reset the FW. After the reset, follow the normal
recovery flow.
Submitted by: slavash@
MFC after: 1 week
Sponsored by: Mellanox Technologies
Some mlx5 adapter firmware allows the driver to reset the firmware in
the event of an error. When a software reset is issued on any physical
function all PFs enter reset state. This is a recoverable condition.
The existing recovery flow was designed to allow the recovery of a
VF after a PF driver reload. This patch expands the scope of that
flow to recover PFs or VFs after a SW reset has been issued.
When a software reset is issued the following occurs:
1. The NIC interface mode is set to SW_RESET (7) while the reset is in
progress.
2. Once the reset completes the NIC interface mode is set to NIC
disabled (1).
After the reset has been issued (added in a subsequent patch) the
health poll for other functions will detect that the NIC interface
state has been set to disabled. This will cause it to enter the
existing recovery flow. If the PCI is still working (meaning it
doesn't return 0xff on all reads) it means recovery can proceed
immediately instead of waiting 60 seconds.
The error detetion has also been refactored to avoid incorrect or
misleading log messages.
Submitted by: slavash@
MFC after: 1 week
Sponsored by: Mellanox Technologies
When mlx5_enter_error_state() operation is forced by shutdown, the
messages surrounding setting the error state are not informational
and confuse users.
Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies
This patch accumulates the following Linux commits:
- 8812c24d28f4972c4f2b9998bf30b1f2a1b62adf
net/mlx5: Add fast unload support in shutdown flow
- 59211bd3b6329c3e5f4a90ac3d7f87ffa7867073
net/mlx5: Split the load/unload flow into hardware and software flows
- 4525abeaae54560254a1bb8970b3d4c225d32ef4
net/mlx5: Expose command polling interface
Submitted by: Matthew Finlay <matt@mellanox.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
This patch accumulates the following Linux commits:
- 04c0c1ab38e95105d950db5b84e727637e149ce7
net/mlx5: PCI error recovery health care simulation
- 0179720d6be2096b8d0a4d143254ff9e77747daa
net/mlx5: Introduce trigger_health_work function
- 3fece5d676939f42f434c63dfe1bd42d7d94e6f0
net/mlx5: Continue health polling until it is explicitly stopped
Submitted by: Matthew Finlay <matt@mellanox.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
The mlx5e_destroy_ifp() function may be called from the system workqueue and
in this case trying to flush all works will cause a dead lock.
Instead of using the system workqueue, create a designated workqueue
for each mlx5en(4) device instance.
Submitted by: slavash@
MFC after: 1 week
Sponsored by: Mellanox Technologies
Mark the table with PNP info.
Fix compilation by returning FILTER_STRAY in two places, as suggested by comments.
Create a simple module from this. Left unconnected because I can't test it as a module.
The mps(4) and mpr(4) drivers and hardware handle T10 Protection
Information, which is a system of checksums and guard blocks to protect
data while it is being transferred and while it is on disk. It is also
known as T10 DIF. For more details, see section 4.22 of the SBC-4 spec.
Supporting Type 2 protection requires using 32 byte CDBs, and filling in
the fields in those CDBs. We don't yet support that in the da(4) driver.
Type 1 and Type 3 protection don't require that, and can be handled by
the mps(4)/mpr(4) driver's code and firmware without any additional
input from the da(4) driver.
If a drive has Type 2 protection enabled (you frequently see this with
SAS drives shipped from Dell), don't set the various EEDP fields in the
mps(4)/mpr(4) driver command fields. Otherwise, you wind up with errors
like this that would otherwise make no sense:
(da9:mpr0:0:18:0): READ(10). CDB: 28 00 00 00 00 00 00 02 00 00
(da9:mpr0:0:18:0): CAM status: SCSI Status Error
(da9:mpr0:0:18:0): SCSI status: Check Condition
(da9:mpr0:0:18:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
(da9:mpr0:0:18:0):
(da9:mpr0:0:18:0): Field Replaceable Unit: 0
(da9:mpr0:0:18:0): Command Specific Info: 0
(da9:mpr0:0:18:0):
(da9:mpr0:0:18:0): Descriptor 0x80: f8 21
(da9:mpr0:0:18:0): Descriptor 0x81: 00 00 00 00 00 00
(da9:mpr0:0:18:0): Error 22, Unretryable error
In other words, what kind of strange SAS hard drive doesn't support a
standard 10 byte SCSI READ command? In this case, one that has Type 2
protection enabled.
We can revisit this when we put Type 2 protection support in the da(4)
driver, but for now this will help people who put Type 2 formatted drives
in a system and wonder what in the world is going on.
MFC after: 3 days
Sponsored by: Spectra Logic
endpoints. The Allwinner driver will need to set this as the EPINFO
register isn't useful there.
Submitted by: jmcneill
Reviewed by: hselasky
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D5881
There's no real annotation for it, so it's not immediately obvious to the
unfamiliar that these pointers are to locations in the EFI runtime map
unlike the system table pointer immediately above them.
bhnd_nv_strdup/bhnd_nv_strndup.
If malloc(9) failed during initial bhnd(4) attach, while allocating the root
NVRAM path string ("/"), the returned NULL pointer would be passed as the
destination to memcpy().
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
On amd64, efi_enter calls fpu_kern_enter(). This may not be called until
fpuinitstate has been invoked, resulting in a kernel panic with
efirt_load="YES" in loader.conf(5).
Move fpuinitstate a little earlier in SI_SUB_DRIVERS so that we can squeeze
efirt between it and efirtc at SI_SUB_DRIVERS, SI_ORDER_ANY. efidev must be
after efirt and doesn't really need to be at SI_SUB_DEVFS, so drop it at
SI_SUB_DRIVER, SI_ORDER_ANY.
The not immediately obvious dependency of fpuinitstate by efirt has been
noted in both places.
Discussed with: kib, andrew
Reported by: Jakob Alvermark <jakob@alvermark.net>
X-MFC-With: r330868
Normally, shutdown_nice() just signals init. However, sometimes it
calls kern_reboot directly. For that case, r331298 dropped the Giant
lock before calling it. This turns out to be incorrect for the more
common case where init exists and we just signal it. Restore the old
behavior. The direct call to kern_reboot() doesn't sync buffers to the
disk, so should work with Giant held, so we don't need to drop locks
here for that.
Noticed by: bde@
Sponsored by: Netflix
summits at BSDCan and BSDCam in 2017.
The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.
It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.
You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.
This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.
There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.
Reviewed by: gnn (previous version)
Obtained from: Netflix, Inc.
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D11085
Don't limit the copy to the size of the target string *pointer* (always
4 on 32-bit / 8 on 64-bit). Instead, just use strdup().
Reported by: Coverity
CID: 1386912
Reviewed by: cem, imp
MFC after: 1 week
drm_modeset_ctl() takes a signed in from userland, does a boundscheck,
and then uses it to index into a structure and write to it. The
boundscheck only checks upper bound, and never checks for nagative
values. If the int coming from userland is negative [after conversion]
it will bypass the boundscheck, perform a negative index into an array
and write to it, causing memory corruption.
Note that this is in the "old" drm driver; this issue does not exist
in drm2.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: cem
MFC after: 1 day
Sponsored by: The FreeBSD Foundation
drm_infobufs() has a structure on the stack, fills it out and copies it
to userland. There are 2 elements in the struct that are not filled out
and left uninitialized. This will leak uninitialized kernel stack data
to userland.
Submitted by: Domagoj Stolfa <ds815@cam.ac.uk>
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
MFC after: 1 day
Security: Kernel memory disclosure (798)
"Under my tutelage Nicole did 85% of the work. At the time it seemed
simplest for a number of reasons to put my copyright on it. I now consider
that to have been a mistake."
Submitted by: Matthew Macy <mmacy@mattmacy.io>
Reviewed by: shurd
Approved by: shurd
Differential Revision: https://reviews.freebsd.org/D14766
On the Allwinner SoCs we need to set a custom endpoint configuration. To
allow for this use a table to store the configuration so the attachment
can override it.
Reviewed by: hselasky
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14783
were the new kids on the block and F00F hacks were all the rage, one
needed to take out Giant to do anything moderately complicated with
the VM, mappings and such. So the pccard / cardbus code held Giant for
the entire insertion or removal process.
Today, the VM is MP safe. The lock is only needed for dealing with
newbus things. Move locking and unlocking Giant to be only around
adding and probing devices in pccard and cardbus.
assym is only to be included by other .s files, and should never
actually be assembled by itself.
Reviewed by: imp, bdrewery (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14180
The U-Boot efi runtime service expects us to set the address map before
calling any runtime services. It will then remap a few functions to their
runtime version. One of these is the gettime function. If we call into
this without having set a runtime map we get a page fault.
Add a check to see if this is valid in efi_init() so we don't try to use
the possibly invalid pointer.
Reviewed by: imp, kevans (both previous version)
X-MFC-With: r330868
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14759
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.
Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.
Sponsored by: Netflix
OF_finddevices returns ((phandle_t)-1) in case of failure. Some code
in existing drivers checked return value to be equal to 0 or
less/equal to 0 which is also wrong because phandle_t is unsigned
type. Most of these checks were for negative cases that were never
triggered so trhere was no impact on functionality.
Reviewed by: nwhitehorn
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14645
Version 16 is just a number bump, since we already had those changes.
Version 17 introduces new AdapterType value, that allows new user-space
tools from Broadcom to differentiate adapter generations 3 and 3.5.
Version 18 updates headers and adds SAS_DEVICE_DISCOVERY_ERROR reporting.
MFC after: 2 weeks
This patch will:
- Update ixgbe shared code
- Add support for Intel(R) Ethernet Connection X552 1000BASE-T
- Add error handling for link state check preventing VF from stopping traffic
after changing PF's MTU value
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: Intel Networking
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D13885
about the chip including the erase block size at attach time.
Also add myself to the copyrights since at this point svn blame would point
to me as the culprit for much of this.
all over the place. Also pass the softc as the arg to all the internal
functions instead of passing a device_t and calling device_get_softc() in
each function.
before starting them.
Using the wait-before logic would make sense if there was useful time-
consuming work that could be done between the end of one write and the
beginning of the next, but it also requires doing the wait-for-ready before
reading, because a prior write or erase could still be in progress. Reading
is the far more common case, so adding a whole extra bus transaction to
check for ready before each read would soak up any small gains that might be
had from doing async writes.
for the same condition that the preceeding lines checked for and would have
returned EIO, so the assert could never possibly trigger (sc_sectorsize must
inherently be an integer multiple of FLASH_PAGE_SIZE).
transfers data in both directions at once. When writing to the device,
use a dummy buffer for the incoming data, not the same buffer as the
outgoing data. Writes are done in FLASH_PAGE_SIZE chunks, which is only
256 bytes, so just put the dummy buffer into the softc.
Occasionally poll for signals during large reads of the /dev/u?random
devices. This allows cancellation via SIGINT of accidental invocations of
very large reads. (A 2GB /dev/random read, which takes about 10 seconds on
my 2017 AMD Zen processor, can be aborted.)
I believe this behavior was intended since 2014 (r273997), just not fully
implemented.
This is motivated by a potential getrandom(2) interface that may not
explicitly forbid extremely large reads on 64-bit platforms -- even larger
than the 2GB limit imposed on devfs I/O by default. Such reads, if they are
to be allowed, should be cancellable by the user or administrator.
Reviewed by: delphij
Approved by: secteam (delphij)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14684
Make some small improvements to the efirtc driver by obtaining the clock
capabilities (resolution and whether the sub-second counters are reset) and
using the info when registering the clock. When the hardware zeroes out the
subsecond info on clock-set, schedule clock updates to happen just before
top-of-second, so that the RTC time is closely in-sync with kernel time.
Also, in the identify() routine, always add the driver if EFI runtime
services are available, then decide in probe() whether to attach the driver
or not. If not attaching and bootverbose is on, say why. All of this is
basically to avoid "silent failure" -- if someone thinks there should be an
efi rtc and it's not attaching, at least they can set bootverbose and maybe
get a clue from the output.
Differential Revision: https://reviews.freebsd.org/D14565 (timed out)
On some systems, we're getting timeouts when we use multiple queues on
drives that work perfectly well on other systems. On a hunch, Jim
Harris suggested I poll the completion queue when we get a timeout.
This patch polls the completion queue if no fatal status was
indicated. If it had pending I/O, we complete that request and
return. Otherwise, if aborts are enabled and no fatal status, we abort
the command and return. Otherwise we reset the card.
This may clear up the problem, or we may see it result in lots of
timeouts and a performance problem. Either way, we'll know the next
step. We may also need to pay attention to the fatal status bit
of the controller.
PR: 211713
Suggested by: Jim Harris
Sponsored by: Netflix
This allows compatibility translation to take place on the stack
(md_ioctl is too big) and is more suitable as a public interface within
the kernel than the kern_ioctl interface.
Except for the initialization of the md_req from the md_ioctl
(including detection of kernel md_file pointers) and the updating
of the md_ioctl prior to return, this is a mechanical replacment
of md_ioctl and mdio with md_req and mdr.
Reviewed by: markj, cem, kib (assorted versions)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14704
Move locks from outside ioctl to the individual implementations.
This is the first step of changing the implementations to act on a
kernel-internal request struct rather than on struct md_ioctl and to
removing the use of kern_ioctl in mountroot.
Reviewed by: cem, kib, markj (prior version)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14700
unconditionally incrementing i in the loop;
Reported by: cem
MFC with: r330880
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14685
It seems default timeout of 100ms is not enough for my 2694L card,
while it was perfectly fine for others, even for full-height 2694.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
The NVME standard has required in section 7.2.6, since at least 1.1,
that a clean shutdown is signalled by deleting the subission and the
completion queues before setting the shutdown bit in CC. The 1.0
standard, apparently, did not and many of the early Intel cards didn't
care. Some newer cards care, at least one whose beta firmware can
scramble the card on an unclean shutdown. Linux has done this for some
time. To make it possible to move forward with an evaluation of this
pre-release card with wonky firmware, delete the queues on the card
when we delete the qpair structures.
Sponsored by: Netflix
We'll need to delete namespaces soon, so go ahead and stop making
these devices eternal. It doesn't help much, and will be getting in
the way soon.
Sponsored by: Netflix
During shutdown mps waits for its SSU requests to complete however when
performing a reboot after handling a panic the scheduler is stopped so
getmicrotime which is used can be non-functional.
Switch to using the same method as shutdown_panic to ensure we actually
complete.
In addition reduce the timeout when RB_NOSYNC is set in howto as we expect
this to fail.
Reviewed by: slm
MFC after: 1 week
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D12776
Compare sbavail() with the cached sb_off of already-sent data instead of
always comparing with zero. This will correctly close the connection and
send the FIN if the socket buffer contains some previously-sent data but
no unsent data.
Reported by: Harsh Jain @ Chelsio
Sponsored by: Chelsio Communications
- Remove the one use of is_tls_offload() and the function. AIO special
handling only needs to be disabled when a TOE socket is actively doing
TLS offload on transmit. The TOE socket's mode (which affects receive
operation) doesn't matter, so remove the check for the socket's mode and
only check if a TOE socket has TLS transmit keys configured to determine
if an AIO write request should fall back to the normal socket handling
instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c. It is not used in critical paths,
so inlining isn't that important. Change return type to bool while here.
Sponsored by: Chelsio Communications
When multiple trims are in the queue, collapse them as much as
possible. At present, this usually results in only a few trims being
collapsed together, but more work on that will make it possible to do
hundreds (up to some configurable max).
Sponsored by: Netflix
The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections. Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing. Currently
these socket options are implemented as TCP options in the vendor
specific range. A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.
TOE sockets can either offload both transmit and reception of TLS
records or just transmit. TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface. Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key. This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library). Receive offload can only be used with TOE sockets using the
TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers. Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket. Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value. A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library. Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.
New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:
dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620
TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.
TLS transmit work requests require a buffer containing IVs. If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request. This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.
Received TLS frames use two new CPL messages. The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors. The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.
A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().
TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.
In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host. As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket. In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase. Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.
A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS). The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.
Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k). To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.
Reviewed by: np (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14529
- Change t4_ddp_mod_load() to return void instead of always returning
success. This avoids having to pretend to have proper support for
unloading when only part of t4_tom_mod_load() has run.
- If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly.
The module handling code in the kernel invokes MOD_UNLOAD on a module
whose MOD_LOAD fails with an error already.
Reviewed by: np (part of a larger patch)
MFC after: 1 month
Sponsored by: Chelsio Communications
Always terminate the list with -1 and document the ioctl behavior.
This preserves existing behavior as seen from userspace with the
addition of the unconditional termination which will not be seen by
working consumers of MDIOCLIST.
Because this ioctl can only be performed by root (in default
configurations) and is not used in the base system this bug is not
deemed to warrant either a security advisory or an eratta notice.
Reviewed by: kib
Obtained from: CheriBSD
Discussed with: security-officer (gordon)
MFC after: 3 days
Security: kernel heap buffer overflow
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14685
For _IO() ioctls, addr is a pointer to uap->data which is a caddr_t.
When the caddr_t stores an int, dereferencing addr as an (int *) results
in truncation on little-endian 64-bit systems and corruption (owing to
extracting top bits) on big-endian 64-bit systems. In practice the
value of chan was probably always zero on systems of the latter type as
all such FreeBSD platforms use a register-based calling convention.
Reviewed by: mav
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14673
As noted in the comment, UEFI spec claims the capabilities pointer is
optional, but some implementations will choke and attempt to dereference it
without checking. This specific problem was found on a Lenovo Thinkpad X220
that would panic in efirtc_identify.
On x86 the IA-PC Boot Flags in the FADT can signal whether VGA is
available or not.
Sponsored by: Citrix systems R&D
Reviewed by: marcel
Differential revision: https://reviews.freebsd.org/D14397
The gcc 7 does check for switch statement fall through cases, and if legit,
such complaint can besilenced by /* FALLTHROUGH */ comment. Unfortunately
such comment is quite limited, but will still notify the reader.
This patch is backport from illumos, see
https://www.illumos.org/rb/r/941/
Reviewed by: eadler
Differential Revision: https://reviews.freebsd.org/D14663
For each regulators create an hw.regulator.<regname>. :
uvolt: Current value
always_on: 1 If the reg is always on
boot_on: 1 If the reg is set at boot time
enable_cnt: Number of consumer(s)
enable_delay: Delay before enabling the regulator
ramp_delay: The Ramp delay
max_uamp: The maximum value of the regulator in uAmps
min_uamp: The minimal value of the regulator in uAmps
max_uvolt: The maximum value of the regulator in uVolts
min_uvolt: The minimal value of the regulator in uVolts
Reviewed by: ian
Differential Revision: https://reviews.freebsd.org/D14578
These parameters may be changed via ifconfig(8); by default,
mgt / mcast rates are lowest possible and ucast rate is not set
(matches previous configuration).
While here, store some variables locally for better readability.
this check on open, but "iscsictl -M", or an iSCSI redirect received by
iscsid(8) could end up with two sessions with the same target name and
portal.
MFC after: 2 weeks