opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-02-23 18:00:31 -05:00

Author	SHA1	Message	Date
John Baldwin	5ae4463498	nvme: Fix typo in "Command Aborted by Host" constant name. Reviewed by: chuck, imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D40763	2023-06-27 10:06:22 -07:00
John Baldwin	9c2203a691	nvme: Tidy up transfer rate settings in XPT_GET_TRAN_SETTINGS. - Replace a magic number with CTS_NVME_VALID_SPEC. - Set the transport and protocol versions the same as for XPT_PATH_INQ. Probably we shouldn't bother with setting the version in the 'spec' member of ccb_trans_settings_nvme at all and use the transport and/or protocol version field instead. Reviewed by: chuck, imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D40616	2023-06-26 20:32:29 -07:00
Warner Losh	bdc81eeda0	nvme: Switch to nda by default We already run nda by default on all the !x86 architectures. Switch the default to nda. nda created nvd compatibility links by default, so this should be a nop. If this causes problems for your application, set hw.nvme.use_nvd=1 in your loader.conf. Sponsored by: Netflix	2023-06-12 21:41:06 -06:00
Warner Losh	4d846d260e	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix	2023-05-12 10:44:03 -06:00
Alexander Motin	49ebbdb264	Add NAMESPACE MANAGEMENT into admin_opcode[]. MFC after: 1 week	2023-03-08 15:42:31 -05:00
Dag-Erling Smørgrav	9a5acf365d	nvme: Clear the notify flag if the consumer rejects the controller. While here, fix some type mismatch warnings. Reviewed by: imp Sponsored by: Netapp, Inc. Sponsored by: Klara, Inc. MFC after: 1 week	2022-12-20 02:53:38 +01:00
Wanpeng Qian	8ab99dbea1	bhyve: abort and return FEATURE_NOT_SAVEABLE while set feature with a save flag for NVMe controller. Currently bhyve's NVMe controller cannot save feature values cross reboot. It should return a FEATURE_NOT_SAVEABLE error when the command specifies a save flag. Quote from NVMe specification, page 205: https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf If the Feature Identifier specified in the Set Features command is not saveable by the controller and the controller receives a Set Features command with the Save bit set to one, then the command shall be aborted with a status of Feature Identifier Not Saveable. Reviewed by: chuck (older version) Approved by: manu (mentor) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32767	2022-11-15 07:48:24 +01:00
Alexander Motin	2a31a06bf1	Add random VMware device IDs. Just to make dmesg look nicer there. MFC after: 1 week	2022-10-20 10:19:24 -04:00
Warner Losh	4982884b99	nvme: Always set deadline to max When a transaction is on the outstanding list, it needs to have a valid timeout value, so set it to infinity before placing it on the list. Place before we put it on the list, even though the list is protected by the qpair lock. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D36920	2022-10-11 12:51:32 -06:00
Alexander Motin	a69c096462	nvme: Print CRD, M and DNR status bits on errors. It may help with some issues debugging. MFC after: 1 week	2022-08-05 10:58:19 -04:00
Gordon Bergling	6e8ab6715d	nvmw(4): Fix a typo in a source code comment - s/inaccessable/inaccessible/ MFC after: 3 days	2022-06-04 11:46:03 +02:00
John Baldwin	1093caa1bb	nvme: Remove unused devclass arguments to DRIVER_MODULE.	2022-05-06 15:46:55 -07:00
John Baldwin	82496a256f	nvme: Use devclass_find to lookup the nvme devclass. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D34995	2022-04-21 10:29:14 -07:00
Warner Losh	0fd4cd405b	nvme: Use controller's page size instead of PAGE_SIZE to create qpair When constructing qpair, use the controller's notion of page size rather than the host's PAGE_SIZE. Currently, these are both 4k, but the arm 16k page size support requires decoupling. There's a "hidden" PAGE_SIZE in btoc, so we must change btoc(x) to howmany(x, ctrlr->page_size) to properly count the number of pages (in the drive's world view) are needed for various calculations. With these changes, we the nvme driver operates at production level load for both host 4k and host 16k page size. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34873	2022-04-15 14:46:19 -06:00
Warner Losh	c5ed67dc90	nvme: Prefer nvme_printf to printf when reporting formatting error Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34872	2022-04-15 14:46:19 -06:00
Warner Losh	3740a8db13	nvme: Further refinements in Host Memory Buffer Sizing Host Memory Buffer units are a mix. For those in the identify structure, the size is in 4kiB chunks. For specifying the buffer description, though, they are in terms of the drive's MPS. Add comments to this effect and change PAGE_SIZE to ctrlr->page_size where needed, as well as correct a mistaken use of NVME_HPS_UNITS in `214df80a9c` as pointed out by rpokala@ after the commit. No functional change is intended, as page_size is still 4k which matches all current hosts' PAGE_SIZE, but to support 16k pages on arm, we need to differentiate these two cases. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34871	2022-04-15 14:46:19 -06:00
Warner Losh	3086efe895	nvme: Remove NVME_MAX_XFER_SIZE, replace inline calculation NVME_MAX_XFER_SIZE used to be a constant (back when MAXPHYS was a constant) to denote the smaller of MAXPHYS or the largest PRP we could encode with our prealloation scheme. However, it's no longer constant since MAXPHYS varies at runtime. In addition, the actual maximum is now based on the drive's currently in use page_size, which is also a runtime expression. As such, remove the define and expand it inline in the one place its used still in the tree. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34870	2022-04-15 14:46:18 -06:00
Warner Losh	3a468f2010	nvme: Use saved mps when initializing drive Make sure we set the MPS we cached (currently the drives minimum mps) in CC (Controller Configuration) when reinitializing the drive. It must match the page_size that we're going to use. Also retire less specific NVME_PAGE_SHIFT since it's now unused. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34869	2022-04-15 14:46:18 -06:00
Warner Losh	55412ef90a	nvme: Rename min_page_size to page_size and save mps The Memory Page Size sets the basic unit of operation for the drive. We currently set this to the drive's minimum page size, but we could set it to any page size the drive supports in the future. Replace min_page_size (it's now unused for that purpose) with page_size to reflect this and cache the MPS we want to use. Use NVME_MPS_SHIFT to compute page_size. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34868	2022-04-15 14:46:18 -06:00
Warner Losh	6e3deec8ca	nvme: Base maximum data transfer size directly on MPSMIN in cap_hi Calculate the maxmimum transfer size based on the MPSMIN we have in our cached copy of cap_hi rather than using min_page_size in the controller. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34867	2022-04-15 14:46:18 -06:00
Warner Losh	a7218e7a6b	nvme: Fix old intel alignment size The intel raid stripe alignment parameter is based on CAP.MPSMIN, so use that directly now that we have it available. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34866	2022-04-15 14:46:18 -06:00
Warner Losh	e66c1b5185	nvme: Define NVME_MPS_SHIFT The memory page size (MPS) is expressed in terms of a 2^(number + 12) and other items in the system inherit this. Create a define rather than sprinkling 12 everywehere. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34865	2022-04-15 14:46:18 -06:00
Gordon Bergling	dfa01f4f98	nvme(4): Fix a typo in a source code comment - s/is is/is/ MFC after: 3 days	2022-04-09 09:24:34 +02:00
Warner Losh	214df80a9c	nvme: new define for size of host memory buffer sizes The nvme spec defines the various fields that specify sizes for host memory buffers in terms of 4096 chunks. So, rather than use a bare 4096 here, use NVME_HMB_UNITS. This is explicitly not the host page size of 4096, nor the default memory page size (mps) of the NVMe drive, but its own thing and needs its own define. No functional change is intended, only the logical spelling of 4k. Sponsored by: Netflix	2022-04-08 23:05:25 -06:00
Warner Losh	161fcf7994	nvme: Publish the drive's capabilities Add cap_lo and cap_hi sysctl to each nvme drive. This publishes the raw capabilities of the drive. Now we can only discover these with bootverbose. Sponsored by: Netflix	2022-03-31 21:13:16 -06:00
Warner Losh	6af6a52ee4	nvme: Save cap_lo and cap_hi Save the capabilities for the drive. Sponsored by: Netflix	2022-03-31 21:12:38 -06:00
Warner Losh	a70b5660f3	nvme: MPS is a power of two, not a size / 8k Setting MPS in the CC should be a power of 2 number (it specifies the page size of the host is 2^(12+MPS)), so adjust the calcuation. There is no functional change because we do not support any architecutres != 4k pages (yet). Other changes are needed for architectures with 16k or 64k pages, especially when the underlying NVMe drive doesn't support that page size (Most drives support a range that's small, and many only support 4k), but let's at least do this calculation correctly. 12 - 12 is just as much 0 as 4096 >> 13 is :) Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D34707	2022-03-31 21:12:38 -06:00
Chuck Tuffli	c2318cf80a	nvme: fix spelling of Namespace Fix spelling of a macro definition. Reviewed by: mav, imp Differential Revision: https://reviews.freebsd.org/D34330	2022-02-21 10:34:46 -08:00
Chuck Tuffli	e71afa1202	nvme: Add OAES bit-field definitions Create definitions for the Optional Asynchronous Events Supported (OAES) values. Also adds a helper macro for the common use case of "mask and shift". E.g. value = NVME_CTRLR_DATA_OAES_NS_ATTR_MASK << NVME_CTRLR_DATA_OAES_NS_ATTR_SHIFT; becomes value = NVMEB(NVME_CTRLR_DATA_OAES_NS_ATTR); Reviewed by: mav, imp Differential Revision: https://reviews.freebsd.org/D34300	2022-02-21 10:34:14 -08:00
Alexander Motin	b3c9b6060f	nvme: Do not rearm timeout for commands without one. Admin queues almost always have several ASYNC_EVENT_REQUEST outstanding. They have no timeouts, but their presence in qpair->outstanding_tr caused useless timeout callout rearming twice a second. While there, relax timeout callout period from 0.5s to 0.5-1s to improve aggregation. Command timeouts are measured in seconds, so we don't need to be precise here. Reviewed by: imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33781	2022-01-07 12:59:16 -05:00
Warner Losh	8f07932272	nvme_sim: Only report PCI related stats when we can For AHCI attached devices, we report the location and identification information of the AHCI controller that we're attached to. We also don't reprot link speed in that case, since we can't get to the PCIe config space registers to find that out. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D33287	2021-12-06 10:23:40 -07:00
Warner Losh	7cf8d63c88	nvme_ahci: Mark AHCI devices as such in the controller Add a quirk to flag AHCI attachment to the controller. This is for any of the strategies for attaching nvme devices as children of the AHCI device for Intel's RAID devices. This also has a side effect of cleaning up resource allocation from failed nvme_attach calls now. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D33285	2021-12-06 10:23:40 -07:00
Warner Losh	053f8ed6eb	nvme: Move to a quirk for the Intel alignment data Prior to NVMe 1.3, Intel produced a series of drives that had performance alignment data in the vendor specific space since no standard had been defined. Move testing the versions to a quick so the NVMe NS code doesn't know about PCI device info. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D33284	2021-12-06 10:23:40 -07:00
Gordon Bergling	5f8ccf6515	nvme(4): Correct a typo in a sysctl description - s/printting/printing/ MFC after: 3 days	2021-11-30 10:26:25 +01:00
Warner Losh	2ec165e3f0	nvme: Reduce traffic to the doorbell register Reduce traffic to doorbell register when processing multiple completion events at once. Only write it at the end of the loop after we've processed everything (assuming we found at least one completion, even if that completion wasn't valid). Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D32470	2021-10-14 08:44:37 -06:00
Warner Losh	18dc12bfd2	nvme: Restore hotplug warning Restore hotplug warning in recovery state machine. No functional change other than what message gets printed. Sponsored by: Netflix	2021-10-12 14:26:54 -06:00
Warner Losh	83581511d9	nvme: Use adaptive spinning when polling for completion or state change We only use nvme_completion_poll in the initialization path. The commands they queue and wait for finish quickly as they involve no I/O to the drive's media. These command take about 20-200 microsecnds each. Set the wait time to 1us and then increase it by 1.5 each successive iteration (max 1ms). This reduces initialization time by 80ms in cpervica's tests. Use this same technique waiting for RDY state transitions. This saves another 20ms. In total we're down from ~330ms to ~2ms. Tested by: cperciva Sponsored by: Netflix Reviewed by: mav Differential Review: https://reviews.freebsd.org/D32259	2021-10-01 19:17:55 -06:00
Warner Losh	4b3da659bf	nvme: Only reset once on attach. The FreeBSD nvme driver has reset the nvme controller twice on attach to address a theoretical issue assuring the hardware is in a known state. However, exierence has shown the second reset is unnecessary and increases the time to boot. Eliminate the second reset. Should there be a situation when you need a second reset (for buggy or at least somewhat out of the mainstream hardware), the hardware option NVME_2X_RESET will restore the old behavior. Document this in nvme(4). If there's any trouble at all with this, I'll add a sysctl tunable to control it. Sponsored by: Netflix Reviewed by: cperciva, mav Differential Revision: https://reviews.freebsd.org/D32241	2021-10-01 11:09:34 -06:00
Warner Losh	e5e26e4a24	nvme: Remove pause while resetting After some study of the code and the standard, I think we can just drop the pause(), unconditionally. If we're not initialized, then there's nothing to wait for from a software perspective. If we are initialized, then there might be outstanding I/O. If so, then the qpair 'recovery state' will transition to WAITING in nvme_ctrlr_disable_qpairs, which will ignore any interrupts for items that complete before we complete the reset by setting cc.en=0. If we go on to fail the controller, we'll cancel the outstanding I/O transactions. If we reset the controller, the hardware throws away pending transactions and we retry all the pending I/O transactions. Any transactions that happend to complete before cc.en=0 will have the same effect in the end (doing the same transaction twice is just inefficient, it won't affect the state of the device any differently than having done it once). The standard imposes no wait times here, so it isn't needed from that perspective. Unanswered Question: Do we may need to disable interrupts while we disable in legacy mode since those are level-sensitive. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D32248	2021-10-01 11:09:05 -06:00
Warner Losh	77054a897f	nvme: Explain a workaround a little better The don't touch the mmio of the drive after we do a EN 1->0 transition is only for a tiny number of dirves that have this unforunate issue. Sponsored by: Netflix	2021-10-01 10:56:10 -06:00
Warner Losh	a245627a4e	nvme_ctrlr_enable: Small style nits Rewrite the nested if's using the preferred FreeBSD style for branches of ifs that return. NFC. Minor tweaks to the comments to better fit new code layout. Sponsored by: Netflix Reviewed by: mav, chuck (prior rev, but comments rolled in) Differential Revision: https://reviews.freebsd.org/D32245	2021-10-01 10:56:10 -06:00
Warner Losh	26259f6ab9	nvme: Use MS_2_TICKS rather than rolling our own Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D32246	2021-10-01 10:56:10 -06:00
Warner Losh	d5fca1dc1d	nvme_ctrlr_enable: Remove unnecessary 5ms delays Remove the 5ms delays after writing the administrative queue registers. These delays are from the very earliest days of the driver (they are in the first commit) and were most likely vestiges of the Chatham NVMe prototype card that was used to create this driver. Many of the workarounds necessary for it aren't necessary for standards compliant cards. The original driver had other areas marked for Chatham, but these were not. They are unneeded. There's three lines of supporting evidence. First, the NVMe standards make no mention of a delay time after these registers are written. Second, the Linux driver doesn't have them, even as an option. Third, all my nvme cards work w/o them. To be safe, add a write barrier between setting up the admin queue and enabling the controller. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D32247	2021-10-01 10:56:10 -06:00
Warner Losh	36a87d0c6f	nvme: Sanity check completion id Make sure the completion ID is in the range of [0..num_trackers) since the values past the end of the act_tr array are never going to be valid trackers and will lead to pain and suffering if we try to dereference them to get the tracker or to set the tracker back to NULL as we complete the I/O. Sponsored by: Netflix Reviewed by: mav, chs, chuck Differential Revision: https://reviews.freebsd.org/D32088	2021-09-28 21:21:50 -06:00
Warner Losh	587aa25525	nvme: count number of ignored interrupts Count the number of times we're asked to process completions, but that we ignore because the state of the qpair isn't in RECOVERY_NONE. Sponsored by: Netflix Reviewed by: mav, chuck Differential Revision: https://reviews.freebsd.org/D32212	2021-09-28 21:18:00 -06:00
Warner Losh	7d5eebe0f4	nvme: Add sanity check for phase on startup. The proper phase for the qpiar right after reset in the first interrupt is 1. For it, make sure that we're not still in phase 0. This is an illegal state to be processing interrupts and indicates that we've failed to properly protect against a race between initializing our state and processing interrupts. Modify stat resetting code so it resets the number of interrpts to 1 instead of 0 so we don't trigger a false positive panic. Sponsored by: Netflix Reviewed by: cperciva, mav (prior version) Differential Revision: https://reviews.freebsd.org/D32211	2021-09-28 21:18:00 -06:00
Warner Losh	fa81f3731d	nvme: start qpair in state RECOVERY_WAITING An interrupt happens on the admin queue right away after the reset, so as soon as we enable interrupts, we'll get a call to our interrupt handler. It is safe to ignore this interrupt if we're not yet initialized, or to process it if we are. If we are initialized, we'll see there's no completion records and return. If we're not, we'll process no completion records and return. Either way, nothing is processed and nothing is lost. Until we've completely setup the qpair, we need to avoid processing completion records. Start the qpair in the waiting recovery state so we return immediately when we try to process completions. The code already sets it to 'NONE' when we're initialization is complete. It's safe to defer completion processing here because we don't send any commands before the initialization of the software state of the qpair is complete. And even if we were to somehow send a command prior to that completing, the completion record for that command would be processed when we send commands to the admin qpair after we've setup the software state. There's no good central point to add an assert for this last condition. This fixes an KASSERT "received completion for unknown cmd" panic on boot. Fixes: `502dc84a8b` Sponsored by: Netflix Reviewed by: mav, cperciva, gallatin Differential Revision: https://reviews.freebsd.org/D32210	2021-09-28 21:16:19 -06:00
Warner Losh	502dc84a8b	nvme: Use shared timeout rather than timeout per transaction Keep track of the approximate time commands are 'due' and the next deadline for a command. twice a second, wake up to see if any commands have entered timeout. If so, quiessce and then enter a recovery mode half the timeout further in the future to allow the ISR to complete. Once we exit recovery mode, we go back to operations as normal. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D28583	2021-09-23 16:42:08 -06:00
Warner Losh	4b977e6dda	nvme/nda: Fail all nvme I/Os after controller fails Once the controller has failed, fail all I/O w/o sending it to the device. The reset of the nvme driver won't schedule any I/O to the failed device, and the controller is in an indeterminate state and can't accept I/O. Fail both at the top end of the sim and the bottom end. Don't bother queueing up the I/O for failure in a different task. Reviewed by: chuck Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D31341	2021-09-17 16:09:21 -06:00
Colin Percival	bad42df9bf	Add some nvme initialization routines to TSLOG About 335 ms of EC2 instance boot time is being spent here.	2021-09-05 12:48:43 -07:00

1 2 3 4 5 ...

363 commits