Commit graph

38 commits

Author SHA1 Message Date
John Baldwin
ef052adf09 nvmf: Narrow scope of sim lock in nvmf_sim_io
nvmf_submit_request() handles races with concurrent queue pair
destruction (or the queue pair being destroyed between
nvmf_allocate_request and nvmf_submit_request), so the lock is not
needed here.  This avoids holding the lock across transport-specific
logic such as queueing mbufs for PDUs to a socket buffer, etc.

Holding the lock across nvmf_allocate_request() ensures that the queue
pair pointers in the softc are still valid as shutdown attempts will
block on the lock before destroying the queue pairs.

Sponsored by:	Chelsio Communications
2024-09-25 21:14:06 -04:00
John Baldwin
aec2ae8b57 nvmf: Always use xpt_done instead of xpt_done_direct
The last reference on a pending I/O request might be held by an mbuf
in the socket buffer.  When this mbuf is freed, the I/O request is
completed which triggers completion of the CCB.  However, this can
occur with locks held (e.g. with so_snd locked when the mbuf is freed
by sbdrop()) raising a LOR between so_snd and the CAM device lock.
Instead, defer CCB completion processing to a thread where locks are
not held.

Sponsored by:	Chelsio Communications
2024-09-25 21:10:44 -04:00
John Baldwin
b1d324d987 ctl: Move extern for control_softc into <cam/ctl/ctl_private.h>
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D46778
2024-09-25 10:21:18 -04:00
John Baldwin
1b3fa1ac36 nvmft: Defer datamove operations to a pool of taskqueue threads
Some block devices may request datamove operations from an ithread
context while holding locks.  Queue datamove operations to a taskqueue
backed by a thread pool to safely permit blocking allocations, etc. in
datamove handling.

Reviewed by:	asomers
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D46551
2024-09-24 16:16:11 -04:00
John Baldwin
bedfac1f02 nvmf_tcp: Fully honor kern.nvmf.tcp.max_transmit_data for C2H_DATA PDUs
The previous version of tcp_send_controller_data avoided sending a
chain of multiple mbufs that exceeded the limit, but if an individual
mbuf was larger than the limit it was sent as a single, over-sized
PDU.  Fix by using m_split() to split individual mbufs larger than the
limit.

Note that this is not a protocol error, per se, as there is no limit
on C2H_DATA PDU lengths (unlike the MAXH2CDATA parameter).  This fix
just honors the administrative limit more faithfully.  This case is
also very unlikely with the default limit of 256k.

Sponsored by:	Chelsio Communications
2024-09-05 17:14:36 -04:00
John Baldwin
f5541f9f47 nvmfd/nvmft: Fix a typo "whiled" -> "while"
Sponsored by:	Chelsio Communications
2024-09-03 16:12:04 -04:00
John Baldwin
8ebacb9daa nvmf_tcp: Correct calculation of number of TTAGs to allocate
The increment of 1 was intended to convert qp->maxr2t from 0's based
to 1 based before multiplying by the queue length.

Sponsored by:	Chelsio Communications
2024-07-30 10:28:54 -04:00
John Baldwin
19c15e41c6 nvmf_tcp: Update R2T accounting stats when aborting command buffers
If a queue pair is destroyed (e.g. due to the TCP connection dropping)
while a host to controller data transfer is in progress, the
pending_r2ts counter can be non-zero.  This can later trigger an
assertion failure when the capsule is freed.  To fix, update the
relevant R2T accounting stats when aborting active command buffers
during queue pair destruction.

Sponsored by:	Chelsio Communications
2024-07-30 10:28:19 -04:00
John Baldwin
6df040ea6e nvmf_tcp: Avoid setting some unused parameters in tcp_allocate_qpair
Specifically, some parameters only apply to either controller or host
queue pairs but not both.

Sponsored by:	Chelsio Communications
2024-07-30 10:27:47 -04:00
John Baldwin
a14de491e0 nvmf_tcp: Use min() to simplify a few statements
Sponsored by:	Chelsio Communications
2024-07-30 10:26:14 -04:00
John Baldwin
5d0498db47 nvmf_tcp: Rename max_c2hdata sysctl to max_transmit_data
This sysctl sets a cap on the maximum payload of transmitted data PDUs
including both C2H_DATA and H2C_DATA PDUs, not just C2H_DATA PDUs.

Sponsored by:	Chelsio Communications
2024-07-25 15:29:43 -04:00
John Baldwin
dcfa6669a3 nvmft: Handle qpair allocation failures during handoff
If the transport fails to create a queue pair, fail with an error
rather than dereferencing a NULL pointer.

Sponsored by:	Chelsio Communications
2024-07-23 11:46:19 -04:00
John Baldwin
43d45f2641 nvmf_tcp: Don't require a data digest for PDUs without data
If a PDU (such as a Command Capsule PDU) on a connection that has
enabled data digests does not have a data section, it will not have
the the PDU data digest flag set.  The previous check was requiring
this flag to be present on all PDU types that support data sections
even if no data was included in the PDU.

Sponsored by:	Chelsio Communications
2024-07-22 17:05:55 -04:00
Mark Johnston
b67f248523 nvmf: Use device_set_descf()
No functional change intended.

MFC after:	1 week
2024-06-16 16:37:26 -04:00
John Baldwin
f46d4971b5 nvmf: Handle shutdowns more gracefully
If an association is disconnected during a clean shutdown, abort all
pending and future I/O requests with an error to avoid hangs either due
to filesystem unmounts or a stuck GEOM event.

If an association is connected during a clean shutdown, gracefully
disconnect from the remote controller and close the open queues.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45462
2024-06-05 12:59:28 -07:00
John Baldwin
aacaeeee8e nvmf: Permit failing I/O requests while disconnected
Add a kern.nvmf.fail_on_disconnection sysctl similar to the
kern.iscsi.fail_on_disconnection sysctl.  This causes pending I/O
requests to fail with an error if an association is disconnected
instead of requeueing to be retried once the association is
reconnected.  As with iSCSI, the default is to queue and retry
operations.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45308
2024-06-05 12:59:07 -07:00
John Baldwin
e140f85dc1 nvmf: Rescan namespaces after reconnecting
While a host was disconnected from a remote controller, namespaces
might have been added, removed, or altered properties.  Rescan the
namespaces after reconnecting to detect any such changes.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45461
2024-06-05 12:53:08 -07:00
John Baldwin
f6d434f110 nvmf: Rescan all namespaces if the changed NS log page is too large
Previously this just punted with a warning message.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45460
2024-06-05 12:52:43 -07:00
John Baldwin
8a082ca89f nvmf: Factor out most of nvmf_rescan_ns into a helper routine
This function accepts a namespace ID and associated namespace data
from IDENTIFY and takes care of updating nvmeXnY and ndaZ.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45459
2024-06-05 12:52:24 -07:00
John Baldwin
02ddb305cc nvmf: Refactor nvmf_add_namespaces to be more generic
Rename to nvmf_scan_active_namespaces and accept an additional
callback function and callback argument.  The callback is invoked on
each active namespace enumerated by the active namespace list from the
IDENTIFY command.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45458
2024-06-05 12:51:56 -07:00
John Baldwin
bed59baba2 nvmf: Pass const pointers to namespace data to nvmf_*_ns
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45457
2024-06-05 12:51:37 -07:00
Chuck Tuffli
ce75bfcac9 nvme: Change namespace device name
Changes the device name for NVMe and NVMe-oF namespaces from using "ns"
to "n" to be more compatible with other operating systems. For example,
a device which was previously /dev/nvme0ns1 is now /dev/nvme0n1.

Preserves the existing functionality by creating alias from nvmeXnY to
nvmeXnsY.

Reviewed by:	imp
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D45414
2024-06-01 04:14:14 -07:00
John Baldwin
da4230af3f nvme/f: Use strlcpy instead of strncpy + manual string termination
Reviewed by:	dab, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45153
2024-05-13 12:04:03 -07:00
John Baldwin
50884a0b09 nvmf_transport: Remove invalid assertion
This is leftover from an earlier iteration of the code where 'nt' was
not dynamically allocated but was the passed in 'ops' pointer so was
always alive.

Reported by:	Coverity Scan
CID:	 	1545042
Sponsored by:	Chelsio Communications
2024-05-10 09:13:40 -07:00
John Baldwin
1f029b86bb nvmf: Use strlcpy instead of strncpy to ensure termination
Reported by:	Coverity Scan
CID:	 	1545054
Sponsored by:	Chelsio Communications
2024-05-10 08:56:51 -07:00
John Baldwin
1d425ef341 nvmf: Add explicit alignment for struct nvmf_fabric_cmd
This avoids -Wcast-align warnings from clang when upcasting from
struct nvmf_fabric_cmd to struct nvmf_fabric_prop_set_cmd.

Reported by:	bapt
Sponsored by:	Chelsio Communications
2024-05-06 15:19:39 -07:00
John Baldwin
a7db82cfd9 nvmf_tcp: Correct tests for PDU direction
Add parentheses to ensure the correct order of operations.

Reported by:	GCC
2024-05-06 14:03:48 -07:00
John Baldwin
e75a79f40b nvmf: Remove packing pragmas from nvmf_proto.h
The protocol structures do not need explicit packing and static
assertions verify the size of all the structures as well as the
offsets of several key fields.  The pragma triggers warnings when
building with GCC.

Sponsored by:	Chelsio Communications
2024-05-06 14:03:44 -07:00
John Baldwin
a15f7c96a2 nvmft: The in-kernel NVMe over Fabrics controller
This is the server (target in SCSI terms) for NVMe over Fabrics.
Userland is responsible for accepting a new queue pair and receiving
the initial Connect command before handing the queue pair off via an
ioctl to this CTL frontend.

This frontend exposes CTL LUNs as NVMe namespaces to remote hosts.
Users can ask LUNS to CTL that can be shared via either iSCSI or
NVMeoF.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44726
2024-05-02 16:38:30 -07:00
John Baldwin
a1eda74167 nvmf: The in-kernel NVMe over Fabrics host
This is the client (initiator in SCSI terms) for NVMe over Fabrics.
Userland is responsible for creating a set of queue pairs and then
handing them off via an ioctl to this driver, e.g. via the 'connect'
command from nvmecontrol(8).  An nvmeX new-bus device is created
at the top-level to represent the remote controller similar to PCI
nvmeX devices for PCI-express controllers.

As with nvme(4), namespace devices named /dev/nvmeXnsY are created and
pass through commands can be submitted to either the namespace devices
or the controller device.  For example, 'nvmecontrol identify nvmeX'
works for a remote Fabrics controller the same as for a PCI-express
controller.

nvmf exports remote namespaces via nda(4) devices using the new NVMF
CAM transport.  nvmf does not support nvd(4), only nda(4).

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44714
2024-05-02 16:29:37 -07:00
John Baldwin
59144db3fc nvmf_tcp: Add a TCP transport for NVMe over Fabrics
Structurally this is very similar to the TCP transport for iSCSI
(icl_soft.c).  One key difference is that NVMeoF transports use a more
abstract interface working with NVMe commands rather than transport
PDUs.  Thus, the data transfer for a given command is managed entirely
in the transport backend.

Similar to icl_soft.c, separate kthreads are used to handle transmit
and receive for each queue pair.  On the transmit side, when a capsule
is transmitted by an upper layer, it is placed on a queue for
processing by the transmit thread.  The transmit thread converts
command response capsules into suitable TCP PDUs where each PDU is
described by an mbuf chain that is then queued to the backing socket's
send buffer.  Command capsules can embed data along with the NVMe
command.

On the receive side, a socket upcall notifies the receive kthread when
more data arrives.  Once enough data has arrived for a PDU, the PDU is
handled synchronously in the kthread.  PDUs such as R2T or data
related PDUs are handled internally, with callbacks invoked if a data
transfer encounters an error, or once the data transfer has completed.
Received capsule PDUs invoke the upper layer's capsule_received
callback.

struct nvmf_tcp_command_buffer manages a TCP command buffer for data
transfers that do not use in-capsule-data as described in the NVMeoF
spec.  Data related PDUs such as R2T, C2H, and H2C are associated with
a command buffer except in the case of the send_controller_data
transport method which simply constructs one or more C2H PDUs from the
caller's mbuf chain.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44712
2024-05-02 16:28:47 -07:00
John Baldwin
aa1207ea4f nvmf: Add infrastructure kernel module for NVMe over Fabrics
nvmf_transport.ko provides routines for managing NVMeoF queue pairs
and capsules.  It provides a glue layer between transports (such as
TCP or RDMA) and an NVMeoF host (initiator) and controller (target).

Unlike the synchronous API exposed to the host and controller by
libnvmf, the kernel's transport layer uses an asynchronous API built
on callbacks.  Upper layers provide callbacks on queue pairs that are
invoked for transport errors (error_cb) or anytime a capsule is
received (receive_cb).

Data transfers for a command are usually associated with a callback
that is invoked once a transfer has finished either due to an error
or successful completion.

For an upper layer that is a host, command capsules are allocated and
populated with an NVMe SQE by calling nvmf_allocate_command.  A data
buffer (described by a struct memdesc) can be associated with a
command capsule before it is transmitted via nvmf_capsule_append_data.
This function accepts a direction (send vs receive) as well as the
data transfer callback.  The host then transmits the command via
nvmf_transmit_capsule.  The host must ensure that the data buffer
described by the 'struct memdesc' remains valid until the data
transfer callback is called.  The queue pair's receive_cb callback
should match received response capsules up with previously transmitted
commands.

For the controller, incoming commands are received via the queue
pair's receive_cb callback.  nvmf_receive_controller_data is used to
retrieve any data from a command (e.g. the data for a WRITE command).
It can be called multiple times to split the data transfer into
smaller sizes.  This function accepts an I/O completion callback that
is invoked once the data transfer has completed.
nvmf_send_controller_data is used to send data to a remote host in
response to a command.  In this case a callback function is not used
but the status is returned synchronously.  Finally, the controller can
allocate a response capsule via nvmf_allocate_response populated with
a supplied CQE and send the response via nvmf_transmit_capsule.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44711
2024-05-02 16:28:32 -07:00
John Baldwin
2f7b0de1de nvmft: Add NVMeoF controller routines shared between kernel and userland
This includes functions to validate NVMe Qualified Names, compute an
initial value of the CAP property, validate changes to the CC
property, and populate the Identify Controller data structure for an
I/O controller.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44709
2024-05-02 16:27:53 -07:00
John Baldwin
bbd5b6fe91 nvmf_tcp.h: Internal header shared between userspace and kernel
- Helper macros for specific SGL types used with the TCP transport

- An inline function which validates various fields in TCP PDUs

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44708
2024-05-02 16:27:38 -07:00
John Baldwin
d86edc181a nvmf.h: New header defining ioctls for NVMe over Fabrics
This defines structures, ioctl commands, and related constants used
for both the Fabrics host and controller.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44706
2024-05-02 16:27:13 -07:00
John Baldwin
52d5738dc5 nvmf_proto.h: Add additional types and constants from the 1.1 spec
- Add opcode, command structure, and new error code for Disconnect
  fabrics opcode.

- Add a generic struct nvmf_fabric_command.

- Add constants for special controller ID values.

- Add constants for the cattr field in the Connect command and the
  default value for the kato field in the Connect command.

- Add constants for the offset of controller properties (Fabrics
  version of controller registers).

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44705
2024-05-02 16:26:56 -07:00
John Baldwin
878d102ab9 nvmf_proto.h: Update for use in FreeBSD
- Replace SPDK_STATIC_ASSERT with _Static_assert.

- Remove SPDK_ and spdk_ prefixes from types and constants.

- Switch to using FreeBSD headers, e.g. <dev/nvme/nvme.h> in place of
  "spdk/nvme_spec.h".

- Add a definition of NVME_NQN_FIELD_SIZE (from SPDK's nvme_spec.h).

- Remove constant for the fabrics opcode as this is already present in
  <dev/nvme/nvme.h>.

- Use types from <dev/nvme/nvme.h> for NVMe structures including
  struct nvme_sgl_descriptor, struct nvme_command, and
  struct nvme_completion.

- Use plain uint16_t in place of struct spdk_nvme_status.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44704
2024-05-02 16:26:33 -07:00
John Baldwin
f2e737683e nvmf_proto.h: NVMe over Fabrics protocol definitions
This is a copy of spdk/include/spdk/nvmf_spec.h as of commit
470e851852bb948334a272c9f8de495020fa082f from Intel's SPDK.
Subsequent commits will modify it to be suitable header for the
kernel, but importing the stock file first makes it easier to see
how the resulting header is derived from the original.

Reviewed by:	imp
Obtained from:	SPDK (https://github.com/spdk/spdk.git)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44703
2024-05-02 16:26:16 -07:00