For requests that handoff queues from userspace to the kernel as well
as the request to fetch reconnect parameters from the kernel, switch
from using flat structures to nvlists. In particular, this will
permit adding support for additional transports in the future without
breaking the ABI of the structures.
Note that this is an ABI break for the ioctls used by nvmf(4) and
nvmft(4). Since this is only present in main I did not bother
implementing compatability shims.
Inspired by: imp (suggestion on a different review)
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D48230
This isn't really needed since the host driver never submits more
commands to a queue than it can hold, but I noticed that the
recently-added SQ head and tail sysctl nodes were not updating. This
fixes that and also uses these values to assert that there we never
submit a command while a queue pair is full.
Sponsored by: Chelsio Communications
Similar to nvme(4), use the current CPU to select which I/O queue to
use. The assignment in nvmf_attach() had to be moved down since
sc->num_io_queues is initialized in nvmf_establish_connection().
Note that nvmecontrol(8) still defaults to using a single I/O queue
for an association.
Sponsored by: Chelsio Communications
The active namespace list query fetches namespaces greater than the
passed in namespace ID, not greater than or equal to the passed in
namespace ID. Thus, a multi-page request should start with the last
namespace ID from the previous page, not that ID plus 1.
While here, make use of NVME_GLOBAL_NAMESPACE_TAG instead of a magic
number to handle the edge case that the last namespace ID in a page is
the largest valid namespace ID.
Reviewed by: chuck
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D47393
Previously the handler was removed from the wrong eventhandler list.
Fixes: f46d4971b5 nvmf: Handle shutdowns more gracefully
Sponsored by: Chelsio Communications
Previously this just dereferenced NULL qp pointers and panicked.
Instead, use a shared lock on the connection lock to protect access to
the qp pointers and allocate a request. If the controller is not
associated, fail the request with ECONNABORTED.
Possibly this should be honoring kern.nvmf.fail_on_disconnection and
block waiting for a reconnect request while disconnected if that
tunable is false.
Reported by: Suhas Lokesha <suhas@chelsio.com>
Sponsored by: Chelsio Communications
nvmf_submit_request() handles races with concurrent queue pair
destruction (or the queue pair being destroyed between
nvmf_allocate_request and nvmf_submit_request), so the lock is not
needed here. This avoids holding the lock across transport-specific
logic such as queueing mbufs for PDUs to a socket buffer, etc.
Holding the lock across nvmf_allocate_request() ensures that the queue
pair pointers in the softc are still valid as shutdown attempts will
block on the lock before destroying the queue pairs.
Sponsored by: Chelsio Communications
The last reference on a pending I/O request might be held by an mbuf
in the socket buffer. When this mbuf is freed, the I/O request is
completed which triggers completion of the CCB. However, this can
occur with locks held (e.g. with so_snd locked when the mbuf is freed
by sbdrop()) raising a LOR between so_snd and the CAM device lock.
Instead, defer CCB completion processing to a thread where locks are
not held.
Sponsored by: Chelsio Communications
If an association is disconnected during a clean shutdown, abort all
pending and future I/O requests with an error to avoid hangs either due
to filesystem unmounts or a stuck GEOM event.
If an association is connected during a clean shutdown, gracefully
disconnect from the remote controller and close the open queues.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45462
Add a kern.nvmf.fail_on_disconnection sysctl similar to the
kern.iscsi.fail_on_disconnection sysctl. This causes pending I/O
requests to fail with an error if an association is disconnected
instead of requeueing to be retried once the association is
reconnected. As with iSCSI, the default is to queue and retry
operations.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45308
While a host was disconnected from a remote controller, namespaces
might have been added, removed, or altered properties. Rescan the
namespaces after reconnecting to detect any such changes.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45461
Previously this just punted with a warning message.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45460
This function accepts a namespace ID and associated namespace data
from IDENTIFY and takes care of updating nvmeXnY and ndaZ.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45459
Rename to nvmf_scan_active_namespaces and accept an additional
callback function and callback argument. The callback is invoked on
each active namespace enumerated by the active namespace list from the
IDENTIFY command.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45458
Changes the device name for NVMe and NVMe-oF namespaces from using "ns"
to "n" to be more compatible with other operating systems. For example,
a device which was previously /dev/nvme0ns1 is now /dev/nvme0n1.
Preserves the existing functionality by creating alias from nvmeXnY to
nvmeXnsY.
Reviewed by: imp
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D45414
This is the client (initiator in SCSI terms) for NVMe over Fabrics.
Userland is responsible for creating a set of queue pairs and then
handing them off via an ioctl to this driver, e.g. via the 'connect'
command from nvmecontrol(8). An nvmeX new-bus device is created
at the top-level to represent the remote controller similar to PCI
nvmeX devices for PCI-express controllers.
As with nvme(4), namespace devices named /dev/nvmeXnsY are created and
pass through commands can be submitted to either the namespace devices
or the controller device. For example, 'nvmecontrol identify nvmeX'
works for a remote Fabrics controller the same as for a PCI-express
controller.
nvmf exports remote namespaces via nda(4) devices using the new NVMF
CAM transport. nvmf does not support nvd(4), only nda(4).
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44714