Commit graph

24 commits

Author SHA1 Message Date
John Baldwin
f46d4971b5 nvmf: Handle shutdowns more gracefully
If an association is disconnected during a clean shutdown, abort all
pending and future I/O requests with an error to avoid hangs either due
to filesystem unmounts or a stuck GEOM event.

If an association is connected during a clean shutdown, gracefully
disconnect from the remote controller and close the open queues.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45462
2024-06-05 12:59:28 -07:00
John Baldwin
aacaeeee8e nvmf: Permit failing I/O requests while disconnected
Add a kern.nvmf.fail_on_disconnection sysctl similar to the
kern.iscsi.fail_on_disconnection sysctl.  This causes pending I/O
requests to fail with an error if an association is disconnected
instead of requeueing to be retried once the association is
reconnected.  As with iSCSI, the default is to queue and retry
operations.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45308
2024-06-05 12:59:07 -07:00
John Baldwin
e140f85dc1 nvmf: Rescan namespaces after reconnecting
While a host was disconnected from a remote controller, namespaces
might have been added, removed, or altered properties.  Rescan the
namespaces after reconnecting to detect any such changes.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45461
2024-06-05 12:53:08 -07:00
John Baldwin
f6d434f110 nvmf: Rescan all namespaces if the changed NS log page is too large
Previously this just punted with a warning message.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45460
2024-06-05 12:52:43 -07:00
John Baldwin
8a082ca89f nvmf: Factor out most of nvmf_rescan_ns into a helper routine
This function accepts a namespace ID and associated namespace data
from IDENTIFY and takes care of updating nvmeXnY and ndaZ.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45459
2024-06-05 12:52:24 -07:00
John Baldwin
02ddb305cc nvmf: Refactor nvmf_add_namespaces to be more generic
Rename to nvmf_scan_active_namespaces and accept an additional
callback function and callback argument.  The callback is invoked on
each active namespace enumerated by the active namespace list from the
IDENTIFY command.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45458
2024-06-05 12:51:56 -07:00
John Baldwin
bed59baba2 nvmf: Pass const pointers to namespace data to nvmf_*_ns
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45457
2024-06-05 12:51:37 -07:00
Chuck Tuffli
ce75bfcac9 nvme: Change namespace device name
Changes the device name for NVMe and NVMe-oF namespaces from using "ns"
to "n" to be more compatible with other operating systems. For example,
a device which was previously /dev/nvme0ns1 is now /dev/nvme0n1.

Preserves the existing functionality by creating alias from nvmeXnY to
nvmeXnsY.

Reviewed by:	imp
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D45414
2024-06-01 04:14:14 -07:00
John Baldwin
da4230af3f nvme/f: Use strlcpy instead of strncpy + manual string termination
Reviewed by:	dab, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45153
2024-05-13 12:04:03 -07:00
John Baldwin
50884a0b09 nvmf_transport: Remove invalid assertion
This is leftover from an earlier iteration of the code where 'nt' was
not dynamically allocated but was the passed in 'ops' pointer so was
always alive.

Reported by:	Coverity Scan
CID:	 	1545042
Sponsored by:	Chelsio Communications
2024-05-10 09:13:40 -07:00
John Baldwin
1f029b86bb nvmf: Use strlcpy instead of strncpy to ensure termination
Reported by:	Coverity Scan
CID:	 	1545054
Sponsored by:	Chelsio Communications
2024-05-10 08:56:51 -07:00
John Baldwin
1d425ef341 nvmf: Add explicit alignment for struct nvmf_fabric_cmd
This avoids -Wcast-align warnings from clang when upcasting from
struct nvmf_fabric_cmd to struct nvmf_fabric_prop_set_cmd.

Reported by:	bapt
Sponsored by:	Chelsio Communications
2024-05-06 15:19:39 -07:00
John Baldwin
a7db82cfd9 nvmf_tcp: Correct tests for PDU direction
Add parentheses to ensure the correct order of operations.

Reported by:	GCC
2024-05-06 14:03:48 -07:00
John Baldwin
e75a79f40b nvmf: Remove packing pragmas from nvmf_proto.h
The protocol structures do not need explicit packing and static
assertions verify the size of all the structures as well as the
offsets of several key fields.  The pragma triggers warnings when
building with GCC.

Sponsored by:	Chelsio Communications
2024-05-06 14:03:44 -07:00
John Baldwin
a15f7c96a2 nvmft: The in-kernel NVMe over Fabrics controller
This is the server (target in SCSI terms) for NVMe over Fabrics.
Userland is responsible for accepting a new queue pair and receiving
the initial Connect command before handing the queue pair off via an
ioctl to this CTL frontend.

This frontend exposes CTL LUNs as NVMe namespaces to remote hosts.
Users can ask LUNS to CTL that can be shared via either iSCSI or
NVMeoF.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44726
2024-05-02 16:38:30 -07:00
John Baldwin
a1eda74167 nvmf: The in-kernel NVMe over Fabrics host
This is the client (initiator in SCSI terms) for NVMe over Fabrics.
Userland is responsible for creating a set of queue pairs and then
handing them off via an ioctl to this driver, e.g. via the 'connect'
command from nvmecontrol(8).  An nvmeX new-bus device is created
at the top-level to represent the remote controller similar to PCI
nvmeX devices for PCI-express controllers.

As with nvme(4), namespace devices named /dev/nvmeXnsY are created and
pass through commands can be submitted to either the namespace devices
or the controller device.  For example, 'nvmecontrol identify nvmeX'
works for a remote Fabrics controller the same as for a PCI-express
controller.

nvmf exports remote namespaces via nda(4) devices using the new NVMF
CAM transport.  nvmf does not support nvd(4), only nda(4).

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44714
2024-05-02 16:29:37 -07:00
John Baldwin
59144db3fc nvmf_tcp: Add a TCP transport for NVMe over Fabrics
Structurally this is very similar to the TCP transport for iSCSI
(icl_soft.c).  One key difference is that NVMeoF transports use a more
abstract interface working with NVMe commands rather than transport
PDUs.  Thus, the data transfer for a given command is managed entirely
in the transport backend.

Similar to icl_soft.c, separate kthreads are used to handle transmit
and receive for each queue pair.  On the transmit side, when a capsule
is transmitted by an upper layer, it is placed on a queue for
processing by the transmit thread.  The transmit thread converts
command response capsules into suitable TCP PDUs where each PDU is
described by an mbuf chain that is then queued to the backing socket's
send buffer.  Command capsules can embed data along with the NVMe
command.

On the receive side, a socket upcall notifies the receive kthread when
more data arrives.  Once enough data has arrived for a PDU, the PDU is
handled synchronously in the kthread.  PDUs such as R2T or data
related PDUs are handled internally, with callbacks invoked if a data
transfer encounters an error, or once the data transfer has completed.
Received capsule PDUs invoke the upper layer's capsule_received
callback.

struct nvmf_tcp_command_buffer manages a TCP command buffer for data
transfers that do not use in-capsule-data as described in the NVMeoF
spec.  Data related PDUs such as R2T, C2H, and H2C are associated with
a command buffer except in the case of the send_controller_data
transport method which simply constructs one or more C2H PDUs from the
caller's mbuf chain.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44712
2024-05-02 16:28:47 -07:00
John Baldwin
aa1207ea4f nvmf: Add infrastructure kernel module for NVMe over Fabrics
nvmf_transport.ko provides routines for managing NVMeoF queue pairs
and capsules.  It provides a glue layer between transports (such as
TCP or RDMA) and an NVMeoF host (initiator) and controller (target).

Unlike the synchronous API exposed to the host and controller by
libnvmf, the kernel's transport layer uses an asynchronous API built
on callbacks.  Upper layers provide callbacks on queue pairs that are
invoked for transport errors (error_cb) or anytime a capsule is
received (receive_cb).

Data transfers for a command are usually associated with a callback
that is invoked once a transfer has finished either due to an error
or successful completion.

For an upper layer that is a host, command capsules are allocated and
populated with an NVMe SQE by calling nvmf_allocate_command.  A data
buffer (described by a struct memdesc) can be associated with a
command capsule before it is transmitted via nvmf_capsule_append_data.
This function accepts a direction (send vs receive) as well as the
data transfer callback.  The host then transmits the command via
nvmf_transmit_capsule.  The host must ensure that the data buffer
described by the 'struct memdesc' remains valid until the data
transfer callback is called.  The queue pair's receive_cb callback
should match received response capsules up with previously transmitted
commands.

For the controller, incoming commands are received via the queue
pair's receive_cb callback.  nvmf_receive_controller_data is used to
retrieve any data from a command (e.g. the data for a WRITE command).
It can be called multiple times to split the data transfer into
smaller sizes.  This function accepts an I/O completion callback that
is invoked once the data transfer has completed.
nvmf_send_controller_data is used to send data to a remote host in
response to a command.  In this case a callback function is not used
but the status is returned synchronously.  Finally, the controller can
allocate a response capsule via nvmf_allocate_response populated with
a supplied CQE and send the response via nvmf_transmit_capsule.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44711
2024-05-02 16:28:32 -07:00
John Baldwin
2f7b0de1de nvmft: Add NVMeoF controller routines shared between kernel and userland
This includes functions to validate NVMe Qualified Names, compute an
initial value of the CAP property, validate changes to the CC
property, and populate the Identify Controller data structure for an
I/O controller.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44709
2024-05-02 16:27:53 -07:00
John Baldwin
bbd5b6fe91 nvmf_tcp.h: Internal header shared between userspace and kernel
- Helper macros for specific SGL types used with the TCP transport

- An inline function which validates various fields in TCP PDUs

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44708
2024-05-02 16:27:38 -07:00
John Baldwin
d86edc181a nvmf.h: New header defining ioctls for NVMe over Fabrics
This defines structures, ioctl commands, and related constants used
for both the Fabrics host and controller.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44706
2024-05-02 16:27:13 -07:00
John Baldwin
52d5738dc5 nvmf_proto.h: Add additional types and constants from the 1.1 spec
- Add opcode, command structure, and new error code for Disconnect
  fabrics opcode.

- Add a generic struct nvmf_fabric_command.

- Add constants for special controller ID values.

- Add constants for the cattr field in the Connect command and the
  default value for the kato field in the Connect command.

- Add constants for the offset of controller properties (Fabrics
  version of controller registers).

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44705
2024-05-02 16:26:56 -07:00
John Baldwin
878d102ab9 nvmf_proto.h: Update for use in FreeBSD
- Replace SPDK_STATIC_ASSERT with _Static_assert.

- Remove SPDK_ and spdk_ prefixes from types and constants.

- Switch to using FreeBSD headers, e.g. <dev/nvme/nvme.h> in place of
  "spdk/nvme_spec.h".

- Add a definition of NVME_NQN_FIELD_SIZE (from SPDK's nvme_spec.h).

- Remove constant for the fabrics opcode as this is already present in
  <dev/nvme/nvme.h>.

- Use types from <dev/nvme/nvme.h> for NVMe structures including
  struct nvme_sgl_descriptor, struct nvme_command, and
  struct nvme_completion.

- Use plain uint16_t in place of struct spdk_nvme_status.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44704
2024-05-02 16:26:33 -07:00
John Baldwin
f2e737683e nvmf_proto.h: NVMe over Fabrics protocol definitions
This is a copy of spdk/include/spdk/nvmf_spec.h as of commit
470e851852bb948334a272c9f8de495020fa082f from Intel's SPDK.
Subsequent commits will modify it to be suitable header for the
kernel, but importing the stock file first makes it easier to see
how the resulting header is derived from the original.

Reviewed by:	imp
Obtained from:	SPDK (https://github.com/spdk/spdk.git)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44703
2024-05-02 16:26:16 -07:00