Commit graph

449 commits

Author SHA1 Message Date
Warner Losh
3d89acf590 nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out
the admin failures and flag them as such so that we can still send admin
commands on half-failed drives.

Fixes: 9229b3105d (nvme: Fail passthrough commands right away in failed state)
Sponsored by: Netflix
2024-08-15 21:31:20 -06:00
Warner Losh
ce7fac64ba Revert "nvme: Separate total failures from I/O failures"
All kinds of crazy stuff was mixed into this commit. Revert
it and do it again.

This reverts commit d5507f9e43.

Sponsored by:		Netflix
2024-08-15 21:29:53 -06:00
Warner Losh
d5507f9e43 nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out
the admin failures and flag them as such so that we can still send admin
commands on half-failed drives.

Fixes: 9229b3105d (nvme: Fail passthrough commands right away in failed state)
Sponsored by: Netflix
2024-08-15 20:22:18 -06:00
Warner Losh
8c44df321c nvme: Add a clarifying comment
While it is easy enough to bounce over to nvme.c from nvme_ctrlr.c to
find this out, I've had to do that several times, so a little bit of
context is quite helpful.

Sponsored by:		Netflix
2024-08-13 16:46:41 -06:00
Warner Losh
d40fc35f93 nvme: Make is_initialized a bool
is_initialized is used as a bool everywhere, and we never do any atomics
with it, so make it really a bool.

Sponsored by:		Netflix
2024-08-13 16:46:41 -06:00
Mark Johnston
d37286b9bf proc: Remove kernel stack swapping support, part 7
Remove some uses of PHOLD which were there only to prevent the process'
threads from being swapped out.

Tested by:	pho
Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D46118
2024-07-29 01:43:49 +00:00
Warner Losh
f68c4b4746 nvme: Add bit names for nvme 2.0 for Async Events
NVME 2.0 has defined a number of new bits for async events. Add
defines for them.

Sponsored by:		Netflix
2024-07-24 20:47:26 -06:00
Warner Losh
bb7f7d5b52 nvme: Warn if there's system interrupt issues.
Issue a warning if we have system interrupt issues. If you get this
warning, then we submitted a request, it timed out without an interrupt
being posted, but when we polled the card's completion, we found
completion events. This indicates that we're missing interrupts, and to
date all the times I've helped people track issues like this down it has
been a system issue, not an NVMe driver isseue.

Sponsored by:		Netflix
Reviewed by:		gallatin
Differential Revision:	https://reviews.freebsd.org/D46031
2024-07-23 17:04:03 -06:00
Warner Losh
aa41354349 nvme: Optimize timeout code further
Optimize timeout code based on three observations.

(1) The tr queues are sorted in order of submission, so the first one
    that could time out is the first "real" one on the list.
(2) Timeouts for a given queue are all the same length (well, except
    at startup, where timeout doesn't matter, and when you change it
    at runtime, where timeouts will still happen eventually and the
    difference isn't worth optimizing for).
(3) Calling the ISR races the real ISR and we should avoid that better.

So now, after checking to see if the card is there and working, the
timeout routine scans the pending tracker list until it finds a non-AER
tracker. If the deadline hasn't passed, we return, doing nothing
further.  Otherwise, we call poll completions and then process the list
looking for timed out items.

This should move the timeout routine to touching hardware only when it's
really necessary. It thus avoids racing the normal ISR, while still
timig out stuck transactions quickly enough.

There was also some minor code motion to make all of the above flow more
nicely for the reader.

When interrupts aren't working at all, then this will increase latency
somewhat. But when interrupts aren't working at all, there's bigger
problems and we should poll quite often in that case. That will be
handled in future commits.

Sponsored by: Netflix
Differential Revision:	https://reviews.freebsd.org/D46026
2024-07-23 17:04:02 -06:00
Warner Losh
e6d3ba4be2 nvme: Lock when processing an abort completion command.
When processing an abort completion command, we have to lock. But we
have to lock the qpair of the original transaction (not the abort we're
completing). We do this to avoid races with checking the completion id
to tr mapping array, as well as to manually complete it.

Note: we don't handle the completion status of 'Asked to abort too many
transactions at once.' That will be fixed on subsequent commits. Add a
note to that effect for now since it's a harder problem to solve.

Sponsored by: Netflix
Differential Revision:	https://reviews.freebsd.org/D46025
2024-07-23 17:04:02 -06:00
Warner Losh
86909f7aeb nvme: Always lock and only avoid processing for recovery state
When we lose a race with the timeout code, shift towards waiting for
that timeout code to complete so we can acquire the lock. This way we
can make sure we're in 'normal' mode before processing I/O
completions. If we're not in 'normal' mode, then we're resetting and we
should avoid completions.

Sponsored by: Netflix
Reviewed by: gallatin
Differential Revision:	https://reviews.freebsd.org/D46024
2024-07-23 17:04:02 -06:00
Warner Losh
123e29068e nvme: widen nvme_qpair_manual_complete_request for better errors
Make nvme_qpair_manual_complete_request take dnr as well as a
print_on_error action. Make the status word computation common between
it and nvme_qpair_manual_complete_tracker. And print the error when
we are cancelling the I/O on failure, but not when we're filtering
the I/O after we've failed. Make it private again to nvme_qpair.c.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D46049
2024-07-19 20:56:04 -06:00
Warner Losh
9229b3105d nvme: Fail passthrough commands right away in failed state.
When the drive is failed, we can't send passthrough commands to the
card, so fail them right away. Rearrange the comments to reflect the
current failure paths in the driver.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D46048
2024-07-19 20:55:52 -06:00
Ryan Libby
b195d7498b nvme: avoid gcc -Werror=int-to-pointer-cast on 32-bit arch
Reviewed by:	chuck (previous version), imp
Differential Revision:	https://reviews.freebsd.org/D45750
2024-06-26 20:50:04 -07:00
Warner Losh
1bce7cd885 nvme: Add Linux copatible ioctls
Add the NVME_IOCTL_ID, NVME_IOCTL_ADMIN_CMD, and NVME_IOCTL_IO_CMD Linux
compatible ioctls. These may be run on either an I/O (ns) dev or a nvme
(admin) dev. Linux allows both on either device, and programs use this
and aren't careful about having the right device open. Emulate this
feature, and implement these ioctls. The data is passed in into the
kernel in host byte order (not converted to le). Results are returned in
host order.

The timeout field is ignore, and the metadata and metadata_len fields
must be zero.

The addr field can be null, even when the data_len is non zero (FreeBSD's
ioctl interface prohibits this, Linux's just ignores the inconsistency).

Only the cdw10 is returned from the command: the status is not returned
in 'result' field. XXX need to verify that this is what Linux does on an
error signaled from the drive.

No external include file is yet available for this: most programs that
call this interface either use a linux-specific path <linux/nvme.h> or
have their own private copy of the data. It's unclear the best thing to
do.

Also, create a /dev/nvmeXnY as an alias for /dev/nvmeXnsY.

These changes allow a native build of nvme-cli to work for everything
that doesn't depend on sysfs entries in /sys, calls that use metadata,
send / receive drive data and sed functionality not in our nvme driver.

Sponsored by:		Netflix
Co-Authored-by:		Chuck Tuffli <chuck@freebsd.org>
Reviewed by:		chuck
Differential Revision:	https://reviews.freebsd.org/D45415
2024-06-14 16:40:08 -06:00
Alexander Motin
80b4232924 nvme: Fix panic on detach after ce75bfcac9
MFC after: 2 weeks
2024-06-14 15:32:10 -04:00
Chuck Tuffli
ce75bfcac9 nvme: Change namespace device name
Changes the device name for NVMe and NVMe-oF namespaces from using "ns"
to "n" to be more compatible with other operating systems. For example,
a device which was previously /dev/nvme0ns1 is now /dev/nvme0n1.

Preserves the existing functionality by creating alias from nvmeXnY to
nvmeXnsY.

Reviewed by:	imp
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D45414
2024-06-01 04:14:14 -07:00
Warner Losh
d09ee08f10 nvme: Count number of alginment splits
When possible, we split up I/Os to NVMe drives that advertise a
preferred alignment. Add a counter for this.

Sponsored by:		Netflix
Reviewed by:		chuck, mav
Differential Revision:	https://reviews.freebsd.org/D45311
2024-05-24 08:32:47 -06:00
Warner Losh
0dd84c3b11 nvme: Add comment about where tr->deadline is set
It's easy to overlook the chain of events that lead to tr->deadline
being updated. Add a comment here to explain what otherwise looks like
an oversight w/o careful study.

Sponsored by:		Netflix
2024-05-13 16:14:04 -06:00
Warner Losh
c931cf6af0 nvme: Slight simplification
We don't need to dereference qpair to get the ctrlr pointer each time,
so use the cached value. It's not going to change. No change intended.

Sponsored by:		Netflix
2024-05-13 16:14:04 -06:00
Warner Losh
9db8ca92b9 nvme: Slight reworking this loop to match FreeBSD style
Update the comment for the code, and slightly rework the code in the
'fast exit' paradigm that FreeBSD generally tries to do.

Sponsored by:		Netflix
2024-05-13 16:14:04 -06:00
Warner Losh
5a178b831a nvme: Add locking asserts
nvme_qpair_complete_tracker and nvme_qpair_manual_complete_tracker have
to be called without the qpair lock, so assert its unowned.

Sponsored by:		Netflix
2024-05-13 16:14:03 -06:00
John Baldwin
da4230af3f nvme/f: Use strlcpy instead of strncpy + manual string termination
Reviewed by:	dab, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45153
2024-05-13 12:04:03 -07:00
John Baldwin
01fc488381 nvme: Use strlcpy instead of strncpy to ensure termination
Reviewed by:	dab, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D45152
2024-05-13 12:03:49 -07:00
Warner Losh
e84a75f936 nvme: Add telemetry page definitions
Add definition for page types 7 and 8 for host initiated telemetry and
controller initiated telemetry (they differ by one byte, but that byte
that's defined in the host version is reserved in the controller
version).

Sponsored by:		Netflix
2024-05-11 12:09:50 -06:00
John Baldwin
ebcfab998e nvme: Explicitly align struct nvme_command on an 8 byte boundary
This was already true for most architectures due to uint64_t structure
members.  However, i386 is special in that it only requires 4 byte
alignment for uint64_t.  As a result, casts from struct nvme_command
to struct nvmf_fabric_cmd were raising a "cast increases alignment"
warning on i386.  Explicitly aligning struct nvme_command pacifies
this warning on i386.

Reported by:	rscheff
Sponsored by:	Chelsio Communications
2024-05-08 16:05:39 -07:00
John Baldwin
29d7e39f56 nvme: Bump the alignment of struct nvme_health_information_page to 8
This ensures that embedded uint64_t values used for statistics
counters are aligned when allocating a structure on the stack or as
part of a containing structure.  In particular this quiets
-Waddress-of-packed-member warnings from GCC when compiling the code
in nvmfd to update the stats.

Reported by:	GCC
2024-05-07 13:54:00 -07:00
John Baldwin
5e3e444230 nvme: Add constants for the Fused Operation (FUSE) field in commands
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44845
2024-05-02 16:31:02 -07:00
John Baldwin
d86edc181a nvmf.h: New header defining ioctls for NVMe over Fabrics
This defines structures, ioctl commands, and related constants used
for both the Fabrics host and controller.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44706
2024-05-02 16:27:13 -07:00
Warner Losh
97b77de2d9 nvme: Eliminate intel_log_temp_stats_swapbytes
We can't post a AER for this page, so there's no need to be able to swap
it to host byte order. It's not one of the standard defined pages that
can post via AER, and the vendor's public docs for this temperature page
don't suggest it's possible to get over or under event changes. Since
nvmecontrol no longer needsd the swap routine, remove it since it's
now unused.

Sponsored by:		Netflix
Reviewed by:		chuck
Differential Revision:	https://reviews.freebsd.org/D44659
2024-04-16 21:30:19 -06:00
Brooks Davis
6bb132ba1e Reduce reliance on sys/sysproto.h pollution
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.

sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44465
2024-04-15 21:35:40 +01:00
Warner Losh
0b8f21e8d1 nvme: Add LPA bits
Add all the bits from the NVMe 2.0 base specification: CMD_EFFECTS to
indicate the commands and effects log page is supported, TELEMETRY to
indicate that the telemetry log pages and protocols are supported,
PERSISTENT_EVENTS to indicate the persistent event log is supported,
LOG_PAGES_PAGE to indicate that various log pages related to log page
and command support are supported: L0, L5, L12, and L13. and
DA4_TELEMETRY to indicate that the DA4 area is supported for telemetry
data.

Sponsored by:		Netflix
2024-04-05 16:53:47 -06:00
John Baldwin
21d3a84db4 nvme: Add NVMe over Fabrics fields to nvme_controller_data
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44448
2024-03-22 17:24:52 -07:00
John Baldwin
7fa8adb8c5 nvme: Add constants for the Controller Attributes field in cdata
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44447
2024-03-22 17:24:31 -07:00
John Baldwin
88ecf154c7 nvme: Add constants and types for the discovery log page
This is used in NVMe over Fabrics to enumerate a list of available
controllers.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44446
2024-03-22 17:24:18 -07:00
John Baldwin
b354bb04cb nvme: Add constants for fields in AER completion dword 0
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44445
2024-03-22 17:24:06 -07:00
John Baldwin
cbda1886ab nvme: Add constants for the extended data for Get Log Page command flag
nvme(4) doesn't check this flag, but Fabrics implementations may need
to set this flag in the log page attributes cdata field.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44444
2024-03-22 17:23:46 -07:00
John Baldwin
b8cb8dd362 nvme: Add constants for the PSDT field in cdw0
This is not used in nvme(4) but is used in NVMe over Fabrics
transports which use SGLs to describe buffers instead of PRPs.

While here, adjust the shift value for the FUSE field to be relative
to the 'fuse' member of 'struct nvme_command'.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44443
2024-03-22 17:23:24 -07:00
John Baldwin
f21a54d190 nvme: Add SGL structure and constants for use in NVMe commands
Fabrics capsules use an SGL structure instead of prp1/2 addresses to
describe the data buffer used for a command.  The SGL structure is
added to a union with the existing prp1/2 fields.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44442
2024-03-22 17:23:09 -07:00
John Baldwin
1931b75e00 nvme: Export constants for min and max queue sizes
These are useful for NVMe over Fabrics.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44441
2024-03-22 17:23:02 -07:00
Warner Losh
fe52c3384c nvme_sim: Add comment about the is_failed test
We only see a request with a failed controller while we're in the
process of failing the controller. Add a comment to that effect.

Sponsored by:		Netflix
2024-03-07 12:05:28 -07:00
Warner Losh
2a2682ee53 nvme: Add SMART WARNING for persistent memory region
NVME 2.0 added persistent memory regions, and this bit reports critical
warnings / errors with those regions.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D44213
2024-03-06 18:38:59 -07:00
Warner Losh
5cdedf676d nvme: Log reset success or failure to devd
We're logging when we start a reset, but not when we complete it, nor
the result. Create now log a success or timed_out event for the reset.
Currently, the only detectable error we have from reset is 'failure to
become ready in time,' though the code looks like it might be more
generic. Log this and if we ever have other failure modes, change the
logging to devd when that happens.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D44211
2024-03-06 18:38:59 -07:00
Warner Losh
4f817fcf6a nvme: Change devctl events for the controller
Change the devctl events slightly for the controller. SMART errors will
log the changed bits in the NVME SMART Critical Warning State as its
event.

Reset will now emit 'event=start'. Soon more.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D44210
2024-03-06 18:38:59 -07:00
Warner Losh
fc3afe9395 nvme: split devctl out to its own function
Split the devctl aspect of things out to its own function in
nvme_ctrlr_devctl_log. In preparing to document this, and based on
actual use, we want something different for the SMART errors, so this
will facilitate that.

Sponsored by:		Netflix
Reviewed by:		chuck, mav
Differential Revision:	https://reviews.freebsd.org/D44209
2024-03-06 18:38:59 -07:00
Warner Losh
c5246cb7b0 nvme: Report only the unknown bits
When we get a smart error that's unknown, report only the unknown
(reserved) bits of the Critical Warning Bitfield.

Sponsored by:		Netflix
2024-03-01 16:04:27 -07:00
John Baldwin
7485926e09 nvme: Firmware revisions in the firmware slot info logpage are ASCII strings
In particular, don't try to byteswap the values as 64-bit integers and
always print a non-empty version as a string.

Reviewed by:	chuck, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44121
2024-03-01 14:18:43 -08:00
John Baldwin
5650bd3fe8 nvme: Use the NVMEF macro to construct fields
Reviewed by:	chuck, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43605
2024-01-29 11:01:13 -08:00
John Baldwin
3a477a9b70 nvme: Add NVMEF helper macro as the inverse of NVMEV
This macro accepts a field name and a value for the field and
constructs the shifted field value.

Reviewed by:	chuck
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43604
2024-01-29 11:00:57 -08:00
John Baldwin
8488fc417f nvme: Use the NVMEM macro instead of expanded versions
A few of these omitted a shift of 0, but this is more consistent.

Reviewed by:	chuck
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43602
2024-01-29 10:59:37 -08:00