Commit graph

4938 commits

Author SHA1 Message Date
Martin Matuska
75e1fea68a zfs: merge openzfs/zfs@1147a2797
Notable upstream pull request merges:
 #16209 --multi-- icp: rip out everything we don't use
 #16230 20c8bdd85 FreeBSD: Update use of UMA-related symbols in
                  arc_available_memory
 #16242 121a2d335 FreeBSD: unregister mountroot eventhandler on unload
 #16258 5de3ac223 vdev_open: clear async fault flag after reopen
 #16270 436731276 zvol: Fix suspend lock leaks
 #16273 c87cb22ba head_errlog: fix use-after-free
 #16284 f72e081fb FreeBSD: Use a statement expression to implement
                  SET_ERROR()
 #16300 a10faf5ce FreeBSD: Use the new freeuio() helper to free dynamically
                  allocated UIOs
 #16302 a7fc4c85e zstd: don't call zstd_mempool_reap if there are no buffers
 #16334 dc91e7452 zdb: dump ZAP_FLAG_UINT64_KEY ZAPs properly

Obtained from:	OpenZFS
OpenZFS commit: 1147a27978
2024-07-18 10:02:12 +02:00
Warner Losh
e9ac41698b Remove residual blank line at start of Makefile
This is a residual of the $FreeBSD$ removal.

MFC After: 3 days (though I'll just run the command on the branches)
Sponsored by: Netflix
2024-07-15 16:43:39 -06:00
Mark Johnston
7cd9131591 vmm: Conditionalize addition of opt_*.h headers
These are only included in the amd64 vmm code, so it doesn't make sense
to list them unconditionally.

PR:		280171
Reviewed by:	wosch, imp, emaste
Differential Revision:	https://reviews.freebsd.org/D45964
2024-07-14 14:29:14 -04:00
Konstantin Belousov
ef2a572bf6 ipsec_offload: kernel infrastructure
Inline IPSEC offload moves almost whole IPSEC processing from the
CPU/MCU and possibly crypto accelerator, to the network card.

The transmitted packet content is not touched by CPU during TX
operations, kernel only does the required policy and security
association lookups to find out that given flow is offloaded, and then
packet is transmitted as plain text to the card. For driver convenience,
a metadata is attached to the packet identifying SA which must process
the packet. Card does encryption of the payload, padding, calculates
authentication, and does the reformat according to the policy.

Similarly, on receive, card does the decapsulation, decryption, and
authentification.  Kernel receives the identifier of SA that was
used to process the packet, together with the plain-text packet.

Overall, payload octets are only read or written by card DMA engine,
removing a lot of memory subsystem overhead, and saving CPU time because
IPSEC algos calculations are avoided.

If driver declares support for inline IPSEC offload (with the
IFCAP2_IPSEC_OFFLOAD capability set and registering method table struct
if_ipsec_accel_methods), kernel offers the SPD and SAD to driver.
Driver decides which policies and SAs can be offloaded based on
hardware capacity, and acks/nacks each SA for given interface to
kernel.  Kernel needs to keep this information to make a decision to
skip software processing on TX, and to assume processing already done
on RX.  This shadow SPD/SAD database of offloads is rooted from
policies (struct secpolicy accel_ifps, struct ifp_handle_sp) and SAs
(struct secasvar accel_ipfs, struct ifp_handle_sav).

Some extensions to the PF_KEY socket allow to limit interfaces for
which given SP/SA could be offloaded (proposed for offload).  Also,
additional statistics extensions allow to observe allocation/octet/use
counters for specific SA.

Since SPs and SAs are typically instantiated in non-sleepable context,
while offloading them into card is expected to require costly async
manipulations of the card state, calls to the driver for offload and
termination are executed in the threaded taskqueue.  It also solves
the issue of allocating resources needed for the offload database.
Neither ipf_handle_sp nor ipf_handle_sav do not add reference to the
owning SP/SA, the offload must be terminated before last reference is
dropped.  ipsec_accel only adds transient references to ensure safe
pointer ownership by taskqueue.

Maintaining the SA counters for hardware-accelerated packets is the
duty of the driver.  The helper ipsec_accel_drv_sa_lifetime_update()
is provided to hide accel infrastructure from drivers which would use
expected callout to query hardware periodically for updates.

Reviewed by:	rscheff	(transport, stack integration), np
Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44219
2024-07-12 07:27:58 +03:00
Eric Melville
4c6cf054c9 Stop forcing -g in mpi3mr module because it breaks non-debug install.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1314
2024-07-07 05:43:29 -06:00
Warner Losh
6a467c783d tpm: Fix standalone build
We're building ACPI, so	we need	-DDEV_ACPI on CFLAGS. Nomally, the
kernel config brings this in, but there's no kernel directory for the
standalone build.

Sponsored by:		Netflix
2024-06-24 23:34:59 -06:00
Shawn Anastasio
3465f14dac ossl: Add support for powerpc64/powerpc64le
Summary:
Add support for building ossl(4) on powerpc64* by implementing ossl_cpuid and
other support functions for powerpc. The required assembly files for ppc were
already present in-tree.

Test Plan: The changes were tested using the in-tree tools/tools/crypto/cryptocheck.c tool on both powerpc64 and powerpc64le on a POWER9 system.

Reviewed by:	#powerpc, jhibbits, jhb
Differential Revision: https://reviews.freebsd.org/D41837
2024-06-21 03:29:04 -04:00
Mark Johnston
ddf0ed09bd sdt: Implement SDT probes using hot-patching
The idea here is to avoid a memory access and conditional branch per
probe site.  Instead, the probe is represented by an "unreachable"
unconditional function call.  asm goto is used to store the address of
the probe site (represented by a no-op sled) and the address of the
function call into a tracepoint record.  Each SDT probe carries a list
of tracepoints.

When the probe is enabled, the no-op sled corresponding to each
tracepoint is overwritten with a jmp to the corresponding label.  The
implementation uses smp_rendezvous() to park all other CPUs while the
instruction is being overwritten, as this can't be done atomically in
general.  The compiler moves argument marshalling code and the
sdt_probe() function call out-of-line, i.e., to the end of the function.

Per gallatin@ in D43504, this approach has less overhead when probes are
disabled.  To make the implementation a bit simpler, I removed support
for probes with 7 arguments; nothing makes use of this except a
regression test case.  It could be re-added later if need be.

The approach taken in this patch enables some more improvements:
1. We can now automatically fill out the "function" field of SDT probe
   names.  The SDT macros let the programmer specify the function and
   module names, but this is really a bug and shouldn't have been
   allowed.  The intent was to be able to have the same probe in
   multiple functions and to let the user restrict which probes actually
   get enabled by specifying a function name or glob.
2. We can avoid branching on SDT_PROBES_ENABLED() by adding the ability
   to include blocks of code in the out-of-line path.  For example:

	if (SDT_PROBES_ENABLED()) {
		int reason = CLD_EXITED;

		if (WCOREDUMP(signo))
			reason = CLD_DUMPED;
		else if (WIFSIGNALED(signo))
			reason = CLD_KILLED;
		SDT_PROBE1(proc, , , exit, reason);
	}

could be written

	SDT_PROBE1_EXT(proc, , , exit, reason,
		int reason;

		reason = CLD_EXITED;
		if (WCOREDUMP(signo))
			reason = CLD_DUMPED;
		else if (WIFSIGNALED(signo))
			reason = CLD_KILLED;
	);

In the future I would like to use this mechanism more generally, e.g.,
to remove branches and marshalling code used by hwpmc, and generally to
make it easier to add new tracepoint consumers without having to add
more conditional branches to hot code paths.

Reviewed by:	Domagoj Stolfa, avg
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D44483
2024-06-19 16:57:41 -04:00
Doug Rabson
e97ad33a89 Add an implementation of the 9P filesystem
This is derived from swills@ fork of the Juniper virtfs with many
changes by me including bug fixes, style improvements, clearer layering
and more consistent logging. The filesystem is renamed to p9fs to better
reflect its function and to prevent possible future confusion with
virtio-fs.

Several updates and fixes from Juniper have been integrated into this
version by Val Packett and these contributions along with the original
Juniper authors are credited below.

To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The
bhyve virtio-9p device allows access from the guest to files on the host
by mapping a 'sharename' to a host path. It is possible to use p9fs as a
root filesystem by adding this to /boot/loader.conf:

	vfs.root.mountfrom="p9fs:sharename"

for non-root filesystems add something like this to /etc/fstab:

	sharename /mnt p9fs rw 0 0

In both examples, substitute the share name used on the bhyve command
line.

The 9P filesystem protocol relies on stateful file opens which map
protocol-level FIDs to host file descriptors. The FreeBSD vnode
interface doesn't really support this and we use heuristics to guess the
right FID to use for file operations.  This can be confused by privilege
lowering and does not guarantee that the FID created for a given file
open is always used for file operations, even if the calling process is
using the file descriptor from the original open call. Improving this
would involve changes to the vnode interface which is out-of-scope for
this import.

Differential Revision: https://reviews.freebsd.org/D41844
Reviewed by: kib, emaste, dch
MFC after: 3 months
Co-authored-by: Val Packett <val@packett.cool>
Co-authored-by: Ka Ho Ng <kahon@juniper.net>
Co-authored-by: joyu <joyul@juniper.net>
Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
2024-06-19 13:12:04 +01:00
Mark Johnston
ab250b02ba bnxt: Use a simpler test for 32-bit platforms
Suggested by:	jrtc27
Fixes:		c867ba7288 ("bnxt: Do not compile on 32-bit platforms")
2024-06-13 21:18:26 -04:00
Andrew Turner
63f7a38343 vmm: Only link the arm64 hyp code in vmm.ko once
This code runs at EL2 while the kernel runs at EL1. We build these
files for EL2 through a dependency in vmm_hyp_blob.elf.full so there
is no need to include them in SRCS.

Reviewed by:	imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D45467
2024-06-10 15:16:10 +00:00
Souradeep Chakrabarti
2b887687ed Hyper-V: TLB flush enlightment using hypercall
Currently FreeBSD uses IPI based TLB flushing for remote
TLB flushing. Hyper-V allows hypercalls to flush local and
remote TLB. The use of Hyper-V hypercalls gives significant
performance improvement in TLB operations.

This patch set during test has shown near to 40 percent
TLB performance improvement.

Also this patch adds rep hypercall implementation as well.

Reviewed by:	whu, kib
Tested by:	whu
Authored-by:	Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by:	Erni Sri Satya Vennela <ernis@microsoft.com>
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D45521
2024-06-07 07:56:07 +00:00
Chandrakanth patil
3f3a15543a mpi3mr: Divert large WriteSame IOs to firmware if unmap and ndob bits are set
Firmware advertises the transfer lenght for writesame commands to driver during init.
So for any writesame IOs with ndob and unmap bit set and transfer lengh is greater
than the max write same length specified by the firmware, then direct those commands
to firmware instead of hardware otherwise hardware will break.

Reviewed by:            imp
Approved by:            imp
Differential revision:  https://reviews.freebsd.org/D44452
2024-06-06 10:39:15 +00:00
Andrew Turner
bed65d85c6 linux64: Fix the build on arm64 with bti checking
When we enable checking for BTI on arm64 we need to include an ELF
note in all object files linked into a module.

As using objcopy from a binary to an ELF object file doesn't add the
note switch to using .incbin from an assembly file. This allows us to
add the needed note without affecting the included object.

Reviewed by:	imp, kib, emaste
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D45468
2024-06-05 09:23:40 +00:00
Andrew Turner
c2e0d56f5e arm64: Support BTI checking in most of the kernel
LLD has the -zbti-report=error argument to check if the BTI note is
present when linking. To allow for this to be used when linking the
kernel and modules:
 - Add the BTI note to the remaining assembly files
 - Mark ptrauth.c as protected by BTI
 - Disable -zbti-report for vmm hypervisor switching code as it's not
   used there.

The linux64 module doesn't build with the flag as it includes vdso code
that doesn't include the note.

Reviewed by:	imp, kib, emaste
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D45466
2024-06-05 09:23:40 +00:00
Martin Matuska
aca928a50a zfs: merge openzfs/zfs@e2357561b
Notable upstream pull request merges:
 #15940 41ae864b6 Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN
 #16128 5137c132a zpool import output is not formated properly
 #16138 efbef9e6c FreeBSD: Add zfs_link_create() error handling
 #16146 04bae5ec9 Disable high priority ZIO threads on FreeBSD and Linux
 #16151 cc3869153 zfs_ioc_send: use a dedicated taskq thread for send
 #16151 adda768e3 spa: remove spa_taskq_dispatch_sync()
 #16151 515c4dd21 spa: flatten spa_taskq_dispatch_ent()
 #16151 0a543db37 spa_taskq_dispatch_ent: simplify arguments
 #16153 975a13259 Add support for parallel pool exports
 #16153 89acef992 Simplified the scope of the namespace lock
 #16159 136c05321 ZAP: Fix leaf references on zap_expand_leaf() errors
 #16162 af5dbed31 Fix scn_queue races on very old pools
 #16165 3400127a7 Fix ZIL clone records for legacy holes
 #16167 414acbd37 Unbreak FreeBSD cross-build on MacOS broken in 051460b8b
 #16172 eced2e2f1 libzfs: Fix mounting datasets under thread limit pressure
 #16178 b64afa41d Better control the thread pool size when mounting datasets
 #16181 fa99d9cd9 zfs_dbgmsg_print: make FreeBSD and Linux consistent
 #16191 e675852bc dbuf: separate refcount calls for dbuf and dbuf_user
 #16198 a043b60f1 Correct level handling in zstream recompress
 #16204 34906f8bb zap: reuse zap_leaf_t on dbuf reuse after shrink
 #16206 d0aa9dbcc Use memset to zero stack allocations containing unions
 #16207 8865dfbca Fix assertion in Persistent L2ARC
 #16208 08648cf0d Allow block cloning to be interrupted by a signal
 #16210 e2357561b FreeBSD: Add const qualifier to members of struct
        opensolaris_utsname
 #16214 800d59d57 Some improvements to metaslabs eviction
 #16216 02c5aa9b0 Destroy ARC buffer in case of fill error
 #16225 01c8efdd5 Simplify issig()

Obtained from:	OpenZFS
OpenZFS commit:	e2357561b9
2024-05-31 11:26:50 +02:00
Emmanuel Vadot
304ac69eca puc: Make kernel module working
We need uart_bus_puc.c in the module for it to work.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
2024-05-29 19:07:51 +02:00
Warner Losh
175b2c00a6 Fix bnxt build in LINT
LINT includes bnxt_re driver. Adjust the path in files, add missing
files and add a new BNXT_C to build (which thinly wraps OFED version
with bnxt specicif stuff).

Sponsored by:		Netflix
Fixes: acd884dec9 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
2024-05-29 09:49:53 -06:00
Mark Johnston
c867ba7288 bnxt: Do not compile on 32-bit platforms
The new bnxt_re driver doesn't compile on any of them (it uses writeq()
from the LinuxKPI, which isn't implemented there), and had already been
disconnected from the build on i386.

Reported by:	Jenkins
Fixes:	acd884dec9 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
2024-05-28 09:12:52 -04:00
Mark Johnston
bf56e8b9c8 bnxt: Add a module makefile to fix the build
Fixes:	35b53f8c98 ("bnxt_en: Add PFC, ETS & App TLVs protocols support")
2024-05-28 08:02:19 -04:00
Chandrakanth patil
faeff3b851 bnxt_{en/re}: Update bnxt_en and bnxt_re Makefile
Reviewed by:            imp
Approved by:            imp
Differential revision:  https://reviews.freebsd.org/D45202
2024-05-28 10:36:11 +00:00
Sumit Saxena
acd884dec9 RDMA/bnxt_re: Add bnxt_re RoCE driver
This patch introduces the RoCE driver for the
Broadcom NetXtreme-E 10/25/50/100/200G RoCE HCAs.

The RoCE driver is a two part driver that relies
on the bnxt_en NIC driver to operate. The changes
needed in the bnxt_en driver is included through
another patch "L2-RoCE driver communication interface"
in this set.

Presently, There is no user space support, Hence
recommendation to use the krping kernel module for
testing. User space support will be incorporated in
subsequent patch submissions.

Reviewed by:            imp
Approved by:            imp
Differential revision:  https://reviews.freebsd.org/D45011
2024-05-28 10:36:11 +00:00
Chandrakanth patil
050d28e13c bnxt_en: L2-RoCE driver communication interface
- Added Aux bus support for RoCE.
- Implemented the ulp ops that are required by RoCE driver.
- Restructure context memory data structures
- DBR pacing support

Reviewed by:            imp
Approved by:            imp
Differential revision:  https://reviews.freebsd.org/D45006
2024-05-28 10:36:10 +00:00
Chandrakanth patil
35b53f8c98 bnxt_en: Add PFC, ETS & App TLVs protocols support
Created new directory "bnxt_en" in /dev/bnxt and /modules/bnxt
and moved source files and Makefile into respective directory.

ETS support:

   - Added new files bnxt_dcb.c & bnxt_dcb.h
   - Added sysctl node 'dcb' and created handlers 'ets' and
     'dcbx_cap'
   - Add logic to validate user input and configure ETS in
     the firmware
   - Updated makefile to include bnxt_dcb.c & bnxt_dcb.h

PFC support:

   - Created sysctl handlers 'pfc' under node 'dcb'
   - Added logic to validate user input and configure PFC in
     the firmware.

App TLV support:

   - Created 3 new sysctl handlers under node 'dcb'
       - set_apptlv (write only): Sets a specified TLV
       - del_apptlv (write only): Deletes a specified TLV
       - list_apptlv (read only): Lists all APP TLVs configured
   - Added logic to validate user input and configure APP TLVs
     in the firmware.

Added Below DCB ops for management interface:

   - Set PFC, Get PFC, Set ETS, Get ETS, Add App_TLV, Del App_TLV
     Lst App_TLV

Reviewed by:            imp
Approved by:            imp
Differential revision:  https://reviews.freebsd.org/D45005
2024-05-28 10:15:29 +00:00
Lexi Winter
0e2ce86627 ipfw: don't build the module if INET not in kernel
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1255
2024-05-24 22:21:24 -06:00
Baptiste Daroussin
8aac90f18a mac_do: add a new MAC/do policy and mdo(1) utility
This policy enables a user to become another user without having to be
root (hence no setuid binary). it is configured via rules using sysctl
security.mac.do.rules

For example:
security.mac.do.rules=uid=1001:80,gid=0:any

The above rule means the user identifier by the uid 1001 is able to
become user 80
Any user of the group 0 are allowed to become any user on the system.

The mdo(1) utility expects the MAC/do policy to be installed and its
rules defined.

Reviewed by:	des
Differential Revision:	https://reviews.freebsd.org/D45145
2024-05-22 14:01:41 +02:00
Justin Hibbits
62adeb92df tpm: Add new tpm_bus.c to module Makefile
Reported by:	eduardo@
Fixes:		c2e9c5bbf0 ("tpm: Refactor TIS and add a SPI attachment")
2024-05-17 12:57:38 -04:00
Lexi Winter
304a03275a sys/modules/dpdk_lpm4: do not build without INET
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1236
2024-05-16 11:16:29 -06:00
Warner Losh
c5f906d32d linux: Make module standalone-buildable
Add opt_inet.h and opt_usb.h to make linux module buildable standalone.

Sponsored by:		Netflix
2024-05-11 16:35:54 -06:00
Dan McGregor
8c2f6c3be0 Address module reproducibility issues
Use .PATH & bare filename. This prevents the real source path from
being included in the built object, which improves reproducibility.

Signed-off-by: Dan McGregor <dan.mcgregor@usask.ca>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1211
2024-05-09 17:37:56 -06:00
Florian Walpen
5687c71d5f snd_hdsp(4): RME HDSP 9632 and HDSP 9652 sound card driver.
Add a sound(4) bridge device driver for the RME HDSP 9632 and HDSP 9652
sound cards. These cards require a nowadays rare PCI 32bit (not PCIe)
slot, but still see use due to their value and wealth of features.
The HDSP 9632 is mostly comparable to the newer HDSPe AIO, while the
HDSP 9652 is similar to the HDSPe RayDAT. These HDSPe PCIe cards are
supported by the snd_hdspe(4) driver which was taken as a starting point
for development of snd_hdsp(4).

Implementation is kept separately due to substantial differences in
hardware configuration and to allow easy removal in case PCI 32bit
support would be phased out in the future.

The snd_hdsp(4) kernel module is not enabled by default, and can be
loaded at runtime with kldload(8) or during boot via loader.conf(5).
Basic operation was tested with both cards, not including all optional
cable connectors and expansion boards. Features should be roughly on par
with the snd_hdspe(4) supported cards.

Reviewed by:	christos, br
Differential Revision:	https://reviews.freebsd.org/D45112
2024-05-09 19:36:40 +01:00
Peter Jeremy
b54d4a1627
dtb: rockchip: Add Radxa ROCK 4C Plus to the build.
The ROCK 4C Plus is a cost-reduced variant of the ROCK Pi 4 based on
the RockChip RK3399-T.

Reviewed by:	manu
MFC after:	1 week
Differential Revision:	<https://reviews.freebsd.org/D45110
2024-05-08 18:06:42 +10:00
Poul-Henning Kamp
d3831ca8e3 Remove lingering geom_bde references. 2024-05-07 09:25:23 +00:00
Poul-Henning Kamp
74be648512 Disconnect GBDE from the build. (Per earlier announcements of retirement.) 2024-05-07 05:19:03 +00:00
HP van Braam
34db47a9db aic7xxx: aicasm correct include file
aicasm just puts the value of the "-i" passed include file in the
generated file with quotes around it. This means that there are manual
edits made to aic7xxx_reg_print.c and aic79xx_reg_print.c

now we check to see if the value passed to '-i' starts with a '<', if it
does don't output the quotes.

Signed-off-by: HP van Braam <hp@tmm.cx>
Reviewed by: imp (minor code simplification)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1209
2024-05-04 08:39:02 -06:00
Justin Hibbits
c2e9c5bbf0 tpm: Refactor TIS and add a SPI attachment
Summary:
Though mostly used in x86 devices, TPM can be used on others, with a
direct SPI attachment.  Refactor the TPM 2.0 driver set to use an
attachment interface, and implement a SPI bus interface.

Test Plan:
Tested on a Raspberry Pi 4, with a GeeekPi TPM2.0 module (SLB9670
TPM) using security/tpm2-tools tpm2_getcaps for very light testing against the
spibus attachment.

Reviewed by:	kd
Obtained from:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D45069
2024-05-03 16:26:11 -04:00
Martin Matuska
b985c9cafd zfs: merge openzfs/zfs@8f1b7a6fa
Notable upstream pull request merges:
 #15839 c3f2f1aa2 vdev probe to slow disk can stall mmp write checker
 #15888 5044c4e3f Fast Dedup: ZAP Shrinking
 #15996 db499e68f Overflowing refreservation is bad
 #16118 67d13998b Make more taskq parameters writable
 #16128 21bc066ec Fix updating the zvol_htable when renaming a zvol
 #16130 645b83307 Improve write issue taskqs utilization
 #16131 8fd3a5d02 Slightly improve dnode hash
 #16134 a6edc0adb zio: try to execute TYPE_NULL ZIOs on the current task
 #16141 b28461b7c Fix arcstats for FreeBSD after zfetch support

Obtained from:	OpenZFS
OpenZFS commit:	8f1b7a6fa6
2024-05-03 18:05:08 +02:00
John Baldwin
a15f7c96a2 nvmft: The in-kernel NVMe over Fabrics controller
This is the server (target in SCSI terms) for NVMe over Fabrics.
Userland is responsible for accepting a new queue pair and receiving
the initial Connect command before handing the queue pair off via an
ioctl to this CTL frontend.

This frontend exposes CTL LUNs as NVMe namespaces to remote hosts.
Users can ask LUNS to CTL that can be shared via either iSCSI or
NVMeoF.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44726
2024-05-02 16:38:30 -07:00
John Baldwin
0c4ee619df ctl: Support for NVMe commands
- Add support for queueing and executing NVMe admin and NVM commands
  via ctl_run and ctl_queue.  This requires fixing a few places that
  were SCSI-specific to add NVME logic.

- NVMe has much simpler command ordering requirements than SCSI.  In
  particular, the HBA is not required to enforce any specific ordering
  for requests with overlapping LBAs.  The host is required to manage
  that ordering.  However, fused commands (currently only COMPARE and
  WRITE NVM commands can be fused) are required to be executed
  atomically.

  To support fused commands, make the second half of a fused command
  block on the first half, and have commands submitted after a fused
  command pair block on the second half.

- Add handlers and command tables for admin and NVM commands that
  operate on individual namespaces and will be passed down from an
  NVMe over Fabrics controller to a CTL LUN.

Reviewed by:	ken, imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44720
2024-05-02 16:32:09 -07:00
John Baldwin
6f308bcf57 ctl: Support NVMe requests in debug trace functions
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44719
2024-05-02 16:31:34 -07:00
John Baldwin
a1eda74167 nvmf: The in-kernel NVMe over Fabrics host
This is the client (initiator in SCSI terms) for NVMe over Fabrics.
Userland is responsible for creating a set of queue pairs and then
handing them off via an ioctl to this driver, e.g. via the 'connect'
command from nvmecontrol(8).  An nvmeX new-bus device is created
at the top-level to represent the remote controller similar to PCI
nvmeX devices for PCI-express controllers.

As with nvme(4), namespace devices named /dev/nvmeXnsY are created and
pass through commands can be submitted to either the namespace devices
or the controller device.  For example, 'nvmecontrol identify nvmeX'
works for a remote Fabrics controller the same as for a PCI-express
controller.

nvmf exports remote namespaces via nda(4) devices using the new NVMF
CAM transport.  nvmf does not support nvd(4), only nda(4).

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44714
2024-05-02 16:29:37 -07:00
John Baldwin
59144db3fc nvmf_tcp: Add a TCP transport for NVMe over Fabrics
Structurally this is very similar to the TCP transport for iSCSI
(icl_soft.c).  One key difference is that NVMeoF transports use a more
abstract interface working with NVMe commands rather than transport
PDUs.  Thus, the data transfer for a given command is managed entirely
in the transport backend.

Similar to icl_soft.c, separate kthreads are used to handle transmit
and receive for each queue pair.  On the transmit side, when a capsule
is transmitted by an upper layer, it is placed on a queue for
processing by the transmit thread.  The transmit thread converts
command response capsules into suitable TCP PDUs where each PDU is
described by an mbuf chain that is then queued to the backing socket's
send buffer.  Command capsules can embed data along with the NVMe
command.

On the receive side, a socket upcall notifies the receive kthread when
more data arrives.  Once enough data has arrived for a PDU, the PDU is
handled synchronously in the kthread.  PDUs such as R2T or data
related PDUs are handled internally, with callbacks invoked if a data
transfer encounters an error, or once the data transfer has completed.
Received capsule PDUs invoke the upper layer's capsule_received
callback.

struct nvmf_tcp_command_buffer manages a TCP command buffer for data
transfers that do not use in-capsule-data as described in the NVMeoF
spec.  Data related PDUs such as R2T, C2H, and H2C are associated with
a command buffer except in the case of the send_controller_data
transport method which simply constructs one or more C2H PDUs from the
caller's mbuf chain.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44712
2024-05-02 16:28:47 -07:00
John Baldwin
aa1207ea4f nvmf: Add infrastructure kernel module for NVMe over Fabrics
nvmf_transport.ko provides routines for managing NVMeoF queue pairs
and capsules.  It provides a glue layer between transports (such as
TCP or RDMA) and an NVMeoF host (initiator) and controller (target).

Unlike the synchronous API exposed to the host and controller by
libnvmf, the kernel's transport layer uses an asynchronous API built
on callbacks.  Upper layers provide callbacks on queue pairs that are
invoked for transport errors (error_cb) or anytime a capsule is
received (receive_cb).

Data transfers for a command are usually associated with a callback
that is invoked once a transfer has finished either due to an error
or successful completion.

For an upper layer that is a host, command capsules are allocated and
populated with an NVMe SQE by calling nvmf_allocate_command.  A data
buffer (described by a struct memdesc) can be associated with a
command capsule before it is transmitted via nvmf_capsule_append_data.
This function accepts a direction (send vs receive) as well as the
data transfer callback.  The host then transmits the command via
nvmf_transmit_capsule.  The host must ensure that the data buffer
described by the 'struct memdesc' remains valid until the data
transfer callback is called.  The queue pair's receive_cb callback
should match received response capsules up with previously transmitted
commands.

For the controller, incoming commands are received via the queue
pair's receive_cb callback.  nvmf_receive_controller_data is used to
retrieve any data from a command (e.g. the data for a WRITE command).
It can be called multiple times to split the data transfer into
smaller sizes.  This function accepts an I/O completion callback that
is invoked once the data transfer has completed.
nvmf_send_controller_data is used to send data to a remote host in
response to a command.  In this case a callback function is not used
but the status is returned synchronously.  Finally, the controller can
allocate a response capsule via nvmf_allocate_response populated with
a supplied CQE and send the response via nvmf_transmit_capsule.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44711
2024-05-02 16:28:32 -07:00
Emmanuel Vadot
11d79c4756 linuxkpi: Add linuxkpi_video module
This contain the hdmi code and the aperture code like in linux.

Differential Revision:	https://reviews.freebsd.org/D44925
Reviewed by:		bz
Obtained from:		drm-kmod
Sponsored by:		Beckhoff Automation GmbH & Co. KG
2024-04-30 07:42:31 +02:00
Emmanuel Vadot
7f84bb34a1 linuxkpi: hdmi: Split the module declaration to a new file
In order to have a proper linuxkpi_video kmod, move the module declaration
to a new file as linuxkpi_video will also include linux_hdmi.c

Differential Revision:	https://reviews.freebsd.org/D44926
Reviewed by:		bz, emaste, wulf
Sponsored by:		Beckhoff Automation GmbH & Co. KG
2024-04-30 07:41:48 +02:00
Ricardo Branco
78444b5ade glabel: Add support for Linux swap
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1205
2024-04-28 22:39:47 -06:00
Christos Margiolis
25723d6636 sound: Retire unit.*
The unit.* code is largely obsolete and imposes limits that are no
longer needed nowadays.

- Capping the maximum allowed soundcards in a given machine. By default,
  the limit is 512 (snd_max_u() in unit.c), and the maximum possible is
  2048 (SND_UNIT_UMAX in unit.h). It can also be tuned through the
  hw.snd.maxunit loader(8) tunable. Even though these limits are large
  enough that they should never cause problems, there is no need for
  this limit to exist in the first place.
- Capping the available device/channel types. By default, this is 32
  (snd_max_d() in unit.c). However, these types are pre-defined in
  pcm/sound.h (see SND_DEV_*), so the cap is unnecessary when we know
  that their number is constant.
- Capping the number of channels per-device. By default, the limit 1024
  (snd_max_c() in unit.c). This is probably the most problematic of the
  limits mentioned, because this limit can never be reached, as the
  maximum is hard-capped at either hw.snd.maxautovchans (16 by default),
  or SND_MAXHWCHAN and SND_MAXVCHANS.

These limtits are encoded in masks (see SND_U_MASK, SND_D_MASK,
SND_C_MASK in unit.h) and are used to construct a bitfield of the form
[dsp_unit, type, channel_unit] in snd_mkunit() which is assigned to
pcm_channel->unit.

This patch gets rid of everything unit.*-related and makes a slightly
different use of the "unit" field to only contain the channel unit
number. The channel type is stored in a new pcm_channel->type field, and
the DSP unit number need not be stored at all, since we can fetch it
from device_get_unit(pcm_channel->dev). This change has the effect that
we no longer need to impose caps on the number of soundcards,
device/channel types and per-device channels. As a result the code is
noticeably simplified and more readable.

Apart from the fact that the hw.snd.maxunit loader(8) tunable is also
retired as a side-effect of this patch, sound(4)'s behavior remains the
same.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Reviewed by:	dev_submerge.ch
Differential Revision:	https://reviews.freebsd.org/D44912
2024-04-28 21:48:24 +02:00
Gleb Smirnoff
c68eed82a3 accf_tls: accept filter that waits for TLS handshake header 2024-04-24 17:53:10 -07:00
Martin Matuska
0d4ad64077 zfs: merge openzfs/zfs@1f940de07
Notable upstream pull request merges:
 #16038 1f940de07 L2ARC: Cleanup buffer re-compression
 #16093 c183d164a Parallel pool import
 #16094 cd3e6b4f4 Add zfetch stats in arcstats
 #16103 35bf25848 Fix: FreeBSD Arm64 does not build currently
 #16104 4036b8d02 Refactor dbuf_read() for safer decryption
 #16110 9f83eec03 Handle FLUSH errors as "expected"
 #16117 c346068e5 zfs get: add '-t fs' and '-t vol' options

Obtained from:  OpenZFS
OpenZFS commit: 1f940de072
2024-04-23 23:59:18 +02:00
Ricardo Branco
a8fd0a5f44 glabel: Remove support for old reiserfs
Reviewed by: imp, emaste
Pull Request: https://github.com/freebsd/freebsd-src/pull/1101
2024-04-19 16:48:28 -06:00