Commit graph

43133 commits

Author SHA1 Message Date
Bjoern A. Zeeb
fa3dfeff95 dpaa2: fix MRU for dpni (and software vlans along)
0480dccd3f tried to fix the MTU for software VLANs given dpni
announces IFCAP_VLAN_MTU.  Unfortunately the initial MRU during
setup is reduced from the maximum supported by the HW to our
maximum ethernet RX frame length so only after further mtu toggles
the solution there would work.
Set the maximum RX frame size (without CRC) to jumbo length +
vlan encap len by default given we also announce IFCAP_JUMBO_MTU.

While here improve the manual (ioctl) MTU setting by checking if
IFCAP_VLAN_MTU is currently enabled and only then add the extra
bytes.

Fixes:		0480dccd3f
MFC after:	3 days
Reviewed by:	dsl
Differential Revision: https://reviews.freebsd.org/D47066
2024-10-12 22:13:39 +00:00
Arvydas Sidorenko
4584c8f0ef lpt: check readiness with predefined macros
Replace spelled-out bits with pre-defined macros for those same bits.
No functional change.

PR: 170076
Reviewed by: imp
2024-10-12 14:40:24 -06:00
Colin Percival
c808132731 acpi_gpiobus: OR GPIO_PIN_(IN|OUT)PUT into flags
Right now flags is set to 0 before this "=" -> "|=" change, but it will
matter when the NOT_YET section above becomes effective.

MFC after:	2 weeks
Sponsored by:	Amazon
2024-10-12 11:14:25 -07:00
Zhenlei Huang
c7a2636889 axgbe: Fix setting promisc mode
Ethernet drivers should respect IFF_PROMISC rather than IFF_PPROMISC.
The latter is for user-requested promisc mode, it implies the former
but not vice versa. Some in-kernel components such as if_bridge(4) and
bpf(4) will set promisc mode for interfaces on-demand.

While here, update the debugging message to be not confusing.

This was spotted while reviewing markj@ 's work D46524.

Test from Franco shows that the interface seems to be unconditionally
initialized to promisc mode regardless of this fix. That needs further
investigation.

Reviewed by:	markj, Franco Fichtner <franco@opnsense.org>
Tested by:	Franco Fichtner <franco@opnsense.org>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D46794
2024-10-12 21:56:56 +08:00
Kevin Bowling
516d92304a igc: Improve a comment and update copyright dates
MFC after:	1 week
2024-10-12 00:53:57 -07:00
Kevin Bowling
bc9402abdd igc: Add AIM
igc is derived from igb and has never had an AIM implementation. The
same algorithm from e1000 is appropriate here.

Upon more detailed study of the Linux driver which has a newer AIM
implementation, it finally became clear to me this is actually a
holdoff timer and not an interrupt limit as it is conventionally
(statically) programmed and displayed as an interrupt rate. The data
sheets also make this somewhat clear.

Thus, AIM accomplishes two beneficial things for a wide variety of
workloads[1]:

1. At low throughput/packet rates, it will significantly lower latency
(by counter-intuitively "increasing" the interrupt rate.. better
thought of as decreasing the holdoff timer because you will modulate
down before coming anywhere near these interrupt rates).
2. At bulk data rates, it is tuned to achieve a lower interrupt rate
(by increasing the holdoff timer) than the current static 8000/s. This
decreases processing overhead and yields more headroom for other work
such as packet filters or userland.

For a single NIC this might be worth a few sys% on common CPUs, but may
be meaningful when multiplied such as if_lagg, if_bridge and forwarding
setups.

The AIM algorithm was re-introduced from the older igb or out of tree
driver, and then modernized with permission to use Intel code from other
drivers.

[1]: http://iommu.com/datasheets/ethernet/controllers-nics/intel/e1000/gbe-controllers-interrupt-moderation-appl-note.pdf

MFC after:	1 week
Relnotes:	yes
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Sponsored by:	BBOX.io
Differential Revision:	https://reviews.freebsd.org/D47053
2024-10-12 00:36:39 -07:00
Warner Losh
0bd23ca0ee uart: Fix cut-n-paste error in DBG2 code
This code is parsing the DBG2 ACPI table, not the SPCR table, so tweak
the comment.

Sponsored by:		Netflix
2024-10-11 15:27:46 -06:00
Warner Losh
9bb56359e6 uart: Document rw:XXX field of hw.uart.console
Add a one-liner description of rw - Register Width added in eae36de826.

Fixes: eae36de826
Sponsored by:		Netflix
2024-10-11 15:27:46 -06:00
Warner Losh
852233cf76 uart: Small style tweak
Use if (err == 0) rather than if (!err) to follow stlye(9) and also the
rest of the file.

Sponsored by:		Netflix
2024-10-11 15:27:46 -06:00
Kevin Bowling
3e501ef896 e1000: Re-add AIM
We originally left this out because iflib modulates interrupts and
accomplishes some level of batching versus the custom queues in the
older driver. Upon more detailed study of the Linux driver which has a
newer implementation, it finally became clear to me this is actually a
holdoff timer and not an interrupt limit as it is conventionally
(statically) programmed and displayed as an interrupt rate. The data
sheets also make this somewhat clear.

Thus, AIM accomplishes two beneficial things for a wide variety of
workloads[1]:

1. At low throughput/packet rates, it will significantly lower latency
(by counter-intuitively "increasing" the interrupt rate.. better
thought of as decreasing the holdoff timer because you will modulate
down before coming anywhere near these interrupt rates).
2. At bulk data rates, it is tuned to achieve a lower interrupt rate
(by increasing the holdoff timer) than the current static 8000/s. This
decreases processing overhead and yields more headroom for other work
such as packet filters or userland.

For a single NIC this might be worth a few sys% on common CPUs, but may
be meaningful when multiplied such as if_lagg, if_bridge and forwarding
setups.

The AIM algorithm was re-introduced from the older igb or out of tree
driver, and then modernized with permission to use Intel code from other
drivers.

I have retroactively added it to lem(4) and em(4) where the same concept
applies, albeit to a single ITR register.

[1]: http://iommu.com/datasheets/ethernet/controllers-nics/intel/e1000/gbe-controllers-interrupt-moderation-appl-note.pdf

Tested by:	cc (https://wiki.freebsd.org/chengcui/testD46768)
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Sponsored by:	BBOX.io
Differential Revision:	https://reviews.freebsd.org/D46768
2024-10-10 22:36:43 -07:00
Konstantin Belousov
8e5b07dd08 mlx5_ipsec: add enough #ifdef IPSEC_OFFLOAD to make LINT_NOIP compilable
Reported by:	kp
Sponsored by:	NVidia networking
Fixes:	2851aafe96
2024-10-10 16:18:11 +03:00
Konstantin Belousov
2851aafe96 mlx5 ipsec_offload: ensure that driver does not dereference dead sahindex
Take the sahtree rlock and check for the DEAD SA state before validating
and filling the SA xfrm attributes.

Sponsored by:	NVidia networking
2024-10-10 12:55:45 +03:00
Roger Pau Monné
e7fe856437 xen/blk{front,back}: fix usage of sector sizes different than 512b
The units of the size reported in the 'sectors' xenbus node is always 512b,
regardless of the value of the 'sector-size' node.  The sector offsets in
the ring requests are also always based on 512b sectors, regardless of the
'sector-size' reported in xenbus.

Fix both blkfront and blkback to assume 512b sectors in the required fields.

The blkif.h public header has been recently updated in upstream Xen repository
to fix the regressions in the specification introduced by later modifications,
and clarify the base units of xenstore and shared ring fields.

PR: 280884
Reported by: Christian Kujau
MFC after: 1 week
Sponsored by: Cloud Software Group
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D46756
2024-10-08 09:29:13 +02:00
Warner Losh
6c711019f2 nvme: Don't create sysctl for io queues not created
When we can't set the number of I/O queues ont he admin queue, we
continue on. However, we don't create the I/O queue structures, so
having pointers (NULL) into them for sysctls makes no sense and leads to
a panic when accessed. When summing up different stats, also skip the
ioq stats when it's NULL.

Sponsored by:		Netflix
2024-10-07 22:22:40 -06:00
Navdeep Parhar
52e5a66eac cxgbe(4): Use correct synchronization when marking the adapter offline.
adapter->flags are guarded by a synch_op, as noted in the comment in
adapter.h where the flags are defined.

Fixes:	5241b210a4 cxgbe(4): Basic infrastructure for ULDs to participate in adapter reset.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2024-10-07 10:25:53 -07:00
Roger Pau Monné
9a73b5b1e8 xen: remove PV suspend/resume support copyright
Thew code for PV suspend/resume support has long been removed, also remove the
copyright notice associated with it.

There are still two copyright blocks with (to my understanding) slightly
different wordings of the BSD 2 clause license.  I however don't feel like
merging them due to those wording differences.

The removal of the PV suspend/resume code was done in
ed95805e90.

Sponsored by: Cloud Software Group
Reviewed by: imp
Differential revision: https://reviews.freebsd.org/D46860
2024-10-07 18:59:45 +02:00
Roger Pau Monné
9dd5105f22 xen: expose support for poweroff/reboot/suspend on xenbus
Some toolstacks won't attempt the signal power actions on xenbus unless the VM
explicitly exposes support for them.  FreeBSD supports all power actions, hence
signal on xenbus such support by setting the nodes to the value of "1".

Sponsored by: Cloud Software Group
Reviewed by: markj
Differential review: https://reviews.freebsd.org/D46859
2024-10-07 18:59:45 +02:00
Florian Walpen
e0c37c160b snd_hdsp(4): Support AO4S-192 and AI4S-192 extension boards.
Create an additional 4 channel pcm device for RME HDSP 9632 sound cards,
to support the optional AO4S-192 and AI4S-192 extension boards. For
simplicity, the <HDSP 9632 [ext]> pcm device is always present, even if
the extension boards are not installed.

Unfortunately I cannot test this with actual hardware, but I made sure
the additional channels do not affect the functionality of the HDSP 9632
as currently in src.

Reviewed by: christos, br
Differential Revision: https://reviews.freebsd.org/D46837
2024-10-04 19:51:49 +01:00
Florian Walpen
8fb4675688 snd_hdspe(4): Addendum to AO4S-192 and AI4S-192 support.
Fix unified pcm mode after support for the AO4S-192 and AI4S-192
extension boards was added. Adjust the man page accordingly.

Reviewed by:	br
Differential Revision:	https://reviews.freebsd.org/D46946
2024-10-04 19:46:39 +01:00
Ruslan Bukin
9e7e15b539 snd_hdspe(4): AO4S/AI4S support.
Add support for RME AO4S/AI4S extension cards. They are designed as a bracket
with 4 stereo TRS jacks each.

https://archiv.rme-audio.de/download/ao4s192_e.pdf
https://archiv.rme-audio.de/download/ai4s192_e.pdf

Reviewed by: Florian Walpen <dev@submerge.ch>
Differential Revision: https://reviews.freebsd.org/D46409
2024-10-04 19:36:06 +01:00
Bartosz Fabianowski
cd8c3af747 ACPI: Treat all 20-element _BIX entires as revision 0
Some Fujitsu Lifebooks return an invalid _BIX object. The first element
of _BIX is a revision number, which indicates what elements will follow:
* ACPI 4.0 defined _BIX revision 0 with 20 elements.
* ACPI 6.0 introduced _BIX revision 1 with 21 elements.
The problem is that the offending Lifebooks have the a non-zero _BIX
revision, but provide 20 fields only.

The ACPICA parser chokes on this [1], but that seems to be
inconsequential. More importantly, our own battery info handling code
also verifies that for revision > 0, there are at least 21 fields - and
refuses to process the invalid _BIX. One workaround would be to
introduce special case / quirk handling for Fujitsu Lifebooks. A better
one is to relax the requirements check: If there are only 20 elements,
treat the _BIX as revision 0, no matter what revision number was
provided by the device.

Linux doesn't run into this problem by the way because it only supports
the 20 fields defined in the ACPI 4.0 spec [3]. It never looks at the
revision number or the 21st field added in ACPI 6.0.

[1] https://cgit.freebsd.org/src/tree/sys/contrib/dev/acpica/components/namespace/nsprepkg.c#n815
[2] https://cgit.freebsd.org/src/tree/sys/dev/acpica/acpi_cmbat.c#n371
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/battery.c#n418

PR: 252030
Reviewed by: imp
MFC After: 2 weeks
2024-10-02 12:30:15 -06:00
Warner Losh
ab03b79062 uart: Add entry for an Intel UART
While we really should infer this baud-clock rate in some cases, use the
right baud-clock for this device.

Sponsored by:		Netflix
2024-10-02 12:29:24 -06:00
Konstantin Belousov
6dcffb980f hyperv: call smp_targeted_tlb_shootdown_native() with pin
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-10-01 14:45:23 +03:00
Navdeep Parhar
9ba8670a8b cxgbe(4): Allow t4_tom to be unloaded safely.
* Disable IFCAP_TOE automatically on all ifnets on all adapters during
  unload.  This is user-friendly and avoids panics due to stale ifnet
  state after t4_tom is unloaded.
* Do not allow unload if tids are in use by the TOE on any adapter.

Reported by:	Bimal Abraham @ Chelsio
MFC after:	1 week
Sponsored by:	Chelsio Communications
2024-09-29 17:38:11 -07:00
Navdeep Parhar
cc110bbec6 cxgbe/t4_tom: Remove duplicate unlock in t4_tom_deactivate.
Fixes:	c1c524852f cxgbe/t4_tom: Implement uld_stop and uld_restart for ULD_TOM.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2024-09-29 17:38:11 -07:00
Kevin Bowling
33ed9bdca3 igc: Add NVM/firmware prints and sysctl
This chipset suffered an (un)usual number of bugs and iterations. Let's
add our NVM/firmware code from e1000 and the similar igc_nvm function
from DPDK to keep track of issues.

MFC after:	1 week
Sponsored by:	BBOX.io
2024-09-29 03:06:03 -07:00
Kevin Bowling
a40ecb6f74 igc: Remove non-existent legacy absolute and packet timers
igc, derived from igb, does not use these registers. All interrupt
timing is governed by EITR or LLI and driven by write-back.

MFC after:	1 week
Sponsored by:	BBOX.io
2024-09-28 21:57:37 -07:00
Kevin Bowling
1e3b1870ad ixgbe: Switch if_sriov read/write back to ixgbe_mbx APIs
These are more succinct than jumping through the function pointers
directly and add some additional error handling.

MFC after:	1 week
2024-09-28 21:17:21 -07:00
Doug Moore
5a5da24fc8 mlx5: optimize ilog2 calculation
Rather than compute ilog2(roundup_pow_of_two(x)), which invokes ilog2
twice, just use order_base_2 once.  And employ that optimization
twice.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D46838
2024-09-28 16:24:44 -05:00
Doug Moore
3873b9a8b3 mlx4: use is_power_of_2
It's faster to use is_power_of_2 than it is to compute
roundup_power_of_two and then compare.  So do that.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D46838
2024-09-28 16:23:17 -05:00
Doug Moore
65c4ec887e gdma: use ispower2
It's faster to use ispower2(n) than it is to compute
roundup_pow_of_two and do a comparison.  So do the former.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D46838
2024-09-28 16:17:03 -05:00
Doug Moore
c44fbfdb56 roundup_pow_of_two: don't take the log of it
Based on the definitions, ilog2(roundup_pow_of_two(x)) ==
order_base_2(x). Replace the former with the latter in a few places to
save a few calculations.

Reviewed by:	bz, kib
Differential Revision:	https://reviews.freebsd.org/D46827
2024-09-28 12:00:06 -05:00
Kevin Bowling
911b3c3aa6 Revert "e1000: Remove redundant EITR shift from igb"
Turns out this is necessary

This reverts commit 26439b5787.
2024-09-28 02:11:55 -07:00
Kevin Bowling
26439b5787 e1000: Remove redundant EITR shift from igb
The E1000_EITR() macro is already multiplying by 0x4 which is the same
as this shift, so we were shifting more than expected.

MFC after:	6 days
Sponsored by:	BBOX.io
2024-09-27 20:36:00 -07:00
Konstantin Belousov
f713ed6694 iommu: extend iommu_map_entry to store the list of associated freed page table pages
The pages are inserted into the added slist if the entry parameter is
passed to iommu_pgfree().  For now it is nop.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-27 20:34:23 +03:00
Konstantin Belousov
bdd5eb33ca iommu: change iommu_domain_map_ops to take iommu_map_entry
instead of base/size.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-27 20:34:23 +03:00
Konstantin Belousov
d50403a691 iommu: add per-unit sysctls reporting the state of DMA and interrupt remapping
Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-27 20:34:23 +03:00
Pierre Pronchery
869d760cb9 bhyve: avoid TOCTOU on iov_len in virtio_vq_recordon()
Avoid a race condition when accessing guest memory, by reading memory
contents only once.

This has also been applied to _vq_record() in
sys/dev/beri/virtio/virtio.c, as per markj@'s suggestion.

Reported by:	Synacktiv
Reviewed by:	markj
Security:	HYP-10
Sponsored by:	The Alpha-Omega Project
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45735
2024-09-27 10:20:53 -04:00
Kevin Bowling
9bf9164fc8 e1000: Clean up ITR/EITR in preparation for AIM
Provide macros to derive the various needed values and make it a bit
more clear the differences between em and igb.

The igb default EITR was not landing at the right offset.

Respect the 'max_interrupt_rate' tunable.

MFC after:	1 week
Sponsored by:	BBOX.io
2024-09-27 01:30:05 -07:00
Kevin Bowling
1c578f1c93 e1000: Clean up legacy absolute and packet timers
The absolute and packet timers only apply to lem and em with some only
applying to the later.

This cleans up the sysctl tree to only show these where applicable and
stops writing to unexpected registers for igb.

MFC after:	1 week
Sponsored by:	BBOX.io
2024-09-26 23:45:04 -07:00
Justin Hibbits
21525fe03c sdhci: Add sysctl to report quirks on the slot
Summary:
It can be useful to see what quirks are applied on an SDHCI slot.

Obtained from:	Juniper Networks, Inc.
Reviewed By: manu
Differential Revision: https://reviews.freebsd.org/D46790
2024-09-26 09:58:54 -04:00
Tom Jones
99adbd1b3f gpioc: Fix handling of priv data during open
Fix the ordering of priv data creation with setting priv data. This
handles failure better and resolves a panic when repeatedly running
tools/tools/gpioevents.

Explicitly initialise more fields in priv data while we are here.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46568
2024-09-26 10:16:17 +01:00
John Baldwin
ef052adf09 nvmf: Narrow scope of sim lock in nvmf_sim_io
nvmf_submit_request() handles races with concurrent queue pair
destruction (or the queue pair being destroyed between
nvmf_allocate_request and nvmf_submit_request), so the lock is not
needed here.  This avoids holding the lock across transport-specific
logic such as queueing mbufs for PDUs to a socket buffer, etc.

Holding the lock across nvmf_allocate_request() ensures that the queue
pair pointers in the softc are still valid as shutdown attempts will
block on the lock before destroying the queue pairs.

Sponsored by:	Chelsio Communications
2024-09-25 21:14:06 -04:00
John Baldwin
aec2ae8b57 nvmf: Always use xpt_done instead of xpt_done_direct
The last reference on a pending I/O request might be held by an mbuf
in the socket buffer.  When this mbuf is freed, the I/O request is
completed which triggers completion of the CCB.  However, this can
occur with locks held (e.g. with so_snd locked when the mbuf is freed
by sbdrop()) raising a LOR between so_snd and the CAM device lock.
Instead, defer CCB completion processing to a thread where locks are
not held.

Sponsored by:	Chelsio Communications
2024-09-25 21:10:44 -04:00
Val Packett
6a4f0c0637 pci_iov: Add a device_printf if out of bus numbers
Reviewed by:	imp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20591
2024-09-25 15:17:16 -07:00
John Baldwin
b1d324d987 ctl: Move extern for control_softc into <cam/ctl/ctl_private.h>
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D46778
2024-09-25 10:21:18 -04:00
Joyu Liao
930a1e6f3d e1000: Delay safe_pause switch until SI_SUB_CLOCKS
Based on sysinit_sub_id, SI_SUB_CLOCKS is after SI_SUB_CONFIGURE.

SI_SUB_CONFIGURE  = 0x3800000,  /* Configure devices */  
At this stage, the variable “cold” will be set to 0.

SI_SUB_CLOCKS    = 0x4800000,  /* real-time and stat clocks*/
At this stage, the clock configuration will be done, and the real-time
clock can be used.

In the e1000 driver, if the API safe_pause_* are called between
SI_SUB_CONFIGURE and SI_SUB_CLOCKS stages, it will choose the wrong
clock source. The API safe_pause_* uses “cold” the value of which is
updated in SI_SUB_CONFIGURE, to decide if the real-time clock source is
ready. However, the real-time clock is not ready til the SI_SUB_CLOCKS
routines are done.

Obtained from:	Juniper Networks
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42920
2024-09-25 02:37:37 -07:00
Navdeep Parhar
ee3da604dd cxgbe(4): Clobber all tracer state on stop and redo only traceq on restart.
Tracers have to be recreated after a restart but that's okay given that
they are used for debugging only.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2024-09-24 16:52:22 -07:00
Bjoern A. Zeeb
5b8f97d8db usb: change LIST to SLIST to avoid LinuxKPI conflicts
In order to better integrate modern LinuxKPI USB this tries to reduce
a contention point of "LIST".  Given there is no need to use a LIST here
change it to SLIST to avoid conflicts.
It is a workaround which does not solve the actual problem (overlapping
namespaces) but it helps us a lot for now.

Sponsored by:	The FreeBSD Foundation
X-MFC?		unclear
Reviewed by:	emaste
Differential Revision: https://reviews.freebsd.org/D46534
2024-09-24 22:53:28 +00:00
John Baldwin
1b3fa1ac36 nvmft: Defer datamove operations to a pool of taskqueue threads
Some block devices may request datamove operations from an ithread
context while holding locks.  Queue datamove operations to a taskqueue
backed by a thread pool to safely permit blocking allocations, etc. in
datamove handling.

Reviewed by:	asomers
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D46551
2024-09-24 16:16:11 -04:00