Summary:
These came in the original DrvAPI commits in 2014, and are obsoleted by
bpf_mtap_if() and ether_bpf_mtap_if(). The `_if` suffix, rather than
prefix, conveys that it's operating on the bpf of the interface, instead
than the interface itself.
Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D41146
(cherry picked from commit 2a3716432d209c5fef1eb1a719f4c1914e7c8b5a)
Include opt_inet.h and opt_inet6.h early in the files including
virtio_net.h, since they use INET and/or INET6.
While there, remove redundant inclusion of sys/types.h, since it is
included already by sys/param.h.
There was a discussion to include opt_inet.h and opt_inet6.h also
in virtio_net.h. glebius suggested to add a mechanism for files
to check, if required opt_*.h files were included. virtio_net.h
will be the first consumer of this mechanism.
Reviewed by: glebius, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52046
(cherry picked from commit 3077532b1bb2911d3012ee90bae9d9499c960569)
Remove an always-false check for whether the request has already
completed before sleeping. Even if the request is complete, the
response tag is updated while holding the channel lock, which is also
held here.
No functional change intended.
Sponsored by: Klara, Inc.
(cherry picked from commit 28c9b13b236d25512cfe4e1902411ff421a14b64)
Replace priorities specified by a base priority and some hardcoded
offset value by symbolic constants. Hardcoded offsets prevent changing
the difference between priorities without changing their relative
ordering, and is generally a dangerous practice since the resulting
priority may inadvertently belong to a different selection policy's
range.
Since RQ_PPQ is 4, differences of less than 4 are insignificant, so just
remove them. These small differences have not been changed for years,
so it is likely they have no real meaning (besides having no practical
effect). One can still consult the changes history to recover them if
ever needed.
No functional change (intended).
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45390
(cherry picked from commit 8ecc41918066422d6788a67251b22d11a6efeddf)
This was recently added to Linux to improve incremental update support,
as you could previously add Allowed-IPs but not remove without replacing
the whole set (and thus, potentially disrupting existing traffic).
Removal is incredibly straightforward; we'll find it in p_aips first
to ensure that it's actually valid for this peer, then we'll delete it
from the radix tree before we remove the corresponding p_aips entry.
Reviewed by: Jason A. Donenfeld, jhb
(cherry picked from commit d15d610fac97df4fefed3f14b31dcfbdcec65bf9)
(cherry picked from commit d1ac3e245f084ee0637bde9a446687621358c418)
We'll re-use these in a future wg_aip_del() to perfectly reconstruct
what we expect to find in a_addr/a_mask.
Reviewed by: ivy, markj (both earlier version), Aaron LI, jhb
(cherry picked from commit 2475a3dab0d5c5614e303c0022a834f725e2a078)
The only difference in the wg_aip_add() call after IP validation is the
address family. Just pull that out into a variable and avoid the two
different callsites for wg_aip_add(). A future change will add a new
call for each case to remove an address from the peer, so it's nice to
avoid needing to repeat the logic for two different branches.
Reviewed by: Aaron LI, Jason A. Donenfeld, ivy, jhb, markj
(cherry picked from commit ba2607ae7dff17957d9e62ccd567ba716c168e77)
This was broken in c63d67e137f3, the early returns prevent building the
media lists as expected.
The BASE-T parts of the patch were suggested by "cyric@mm.st", while I
am adding the additional 40G AOC, 1CX, autoneg and unknown PHY fixes
based on code inspection. There may be additional work left here for
Broadcom but this is certainly better than the returns.
PR: 287395
Reported by: mickael.maillot@gmail.com, cyric@mm.st
Tested by: Einar Bjarni Halldórsson <einar@isnic.is>
(cherry picked from commit 5e6e4f752833acc96f1efc893318d3f6b74b9689)
The header file might be included after linux/stddef.h or others are
included and the macros would be re-defined.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D50156
(cherry picked from commit 152e6197615570e7a2f5f1c6c2ed00ecee9dd10c)
In order to be able to use MODULE_DEVICE_TABLE() with multiple bus
attachments, factor out the bus-specfic MODULE_PNP_INFO() and place
it next to the structure defining the table.
As it turns out bnxt(4) has been using the MODULE_DEVICE_TABLE() with
PCI attachments for the "auxillary" bus so far. That makes little sense.
Define the MODULE_PNP_INFO() to nothing for that. We may consider
pulling these LinucKPI bits in semi-native drivers into LinuxKPI
one day as that route is not really sustainabke.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp, dumbbell
Differential Revision: https://reviews.freebsd.org/D51049
(cherry picked from commit 2f5666c1727c949491f73e6c3277b7b542131714)
Otherwise we can end up with a lost interrupt, causing lost request
completion wakeups and hangs in the filesystem layer.
Continue processing until we enable interrupts and then observe an empty
queue, like other virtio drivers do.
Sponsored by: Klara, Inc.
In (unknown) situations it seems the i2c bus can have trouble,
while nothing about the current link state has changed, the driver
would react by going into a link down state, and start busylooping
on up to 4 cores. Even if there was a valid link, such spinning
on a cpu by a kernel thread would wreak havoc to existing and
new connections.
This patch does the following:
1. If such a bus failure occurs, we keep the last known link state.
2. Prevent busy looping by implementing the lockmgr() facility to
be able to sleep while the i2c code waits on the i2c ISR. We cap
this with a timeout.
3. Pin the admin queues to the last CPU in the system, to prevent
other scenarios where busy looping might occur from landing on CPU
0, which especially seems to cause a lot of issues.
Given the design constraints both in hardware and in software,
the lockmgr() seems to be the only viable option, even though
FreeBSD explicitly forbids sleeping in callout context, but
fails to explain why this is or offer alternatives.
axgbe: revert allocating admin queues to last CPU
The issue was resolved in 52454a1e5b.
Scheduled threads such as CARP are now no longer pinned to CPU 0, making sure
they always get their time slice even if CPUs are blocked.
Since the I/O expander chip does not do a reset when soft power
cycling, the driver will first turn off all LEDs when initializing,
although no specific routine seems to be called when powering down.
This means that the LEDs will stay on until the driver has booted up,
after which the driver will be in a consistent state.
Initially, RSF (Receive Queue Store and Forward) was disabled for
unknown reasons, but the cut-through mode that's enabled as a result
seems to send 0 length packets up to the DMA when the RX queue is
full.
Since the iflib interface needs axgbe_pci_init() and its phy starting capabilities, no data was passed in its absence.
With the NULL check of the axgbe_miibus we also resort back to an MDIO read as a module might be capable of both
clause 22 and clause 45 methods of communication.
with the move of phy_stop() to if_detach() in d50d4e8cd4, it's better to prevent reconfiguring the phy should the pci_init() callout trigger more than once.
Within the code path of autonegotiation for gigabit SFP modules was a bug, causing
a report of LINK_ERR for cases where an external SFP PHY was present. Fixing this issue
did not resolve to a link however, as it turned out that while autonegotiation interrupts
were happening, it's resulting status cannot be correctly determined in all cases. In these
specific cases we have no other option than to assume a module has negotiated to 1Gbit/s.
PHY-specific configuration has been delegated to the miibus driver, if an external PHY is present.
It's possible that the i2c bus does not recognize a PHY on the first pass, so in all cases we
retry up to a maximum of 5 times during each link poll pass to ensure we didn't miss the presence
of an external PHY.
This commit also addresses link issues on both 100 mbit and 1Gb fiber modules. Not all of these modules
have the correct data set according to SFF-8472, as such we first check for gigabit compliance and
the associated baudrate, otherwise we resort back to determining what type of fiber module is plugged
in by checking the baudrate, cable length and wavelength and setting the MAC speed accordingly.
It is possible for a machine to boot into a state in which the configuration register,
responsible for controlling wether an I/O signal is considered an input or output,
contains randomized values. It was assumed this was programmed by the BIOS.
If I/O is reversed, it's possible for the driver to think an SFPP module has been inserted
when there is none, leading to unrecoverable I2C errors.
The configuration register should contain a state which is determined and provided by the BIOS,
hence no hard-coded values are programmed here.
The current addition to the interrupt nesting level in
xen_arch_intr_handle_upcall() needs to be compensated in
xen_intr_handle_upcall(), otherwise interrupts dispatched by the upcall handler
end up seeing a td_intr_nesting_level of 2 or more, which makes them assume
there's been an interrupt nesting.
Such extra interrupt nesting count lead to statclock() reporting idle time as
interrupt, as the call from interrupt context will always be seen as a nested
one (td->td_intr_nesting_level >= 2) due to the nesting count increase done by
both xen_arch_intr_handle_upcall() and intr_execute_handlers().
Fix this by adjusting the nested interrupt count before dispatching interrupts
from xen_intr_handle_upcall().
PR: 277231
Reported by: Matthew Grooms <mgrooms@shrew.net>
Fixes: af610cabf1 ('xen/intr: adjust xen_intr_handle_upcall() to match driver filter')
Sponsored by: Cloud Software Group
Reviewed by: Elliott Mitchell <ehem+freebsd@m5p.com>
When executing `ifconfig -v` this will lead to stalls for a second per
interface due to the timeout being set to a static 10 without a module
placed, this patch makes sure this is only allowed once per insertion.
Build and sysctl configuration modes are introduced for QAT SPR
devices to disable safe dc mode. A new QAT driver build option
‘QAT_DISABLE_SAFE_DC_MODE’ is required to build the QAT driver
with code that allows a request to be sent to FW to override the
‘History Buffer’ mitigation. Default QAT driver builds do not
include this ‘QAT_DISABLE_SAFE_DC_MODE’ build option. Even if the
QAT driver was built with code that allows a request to be sent to
FW to override the ‘History Buffer’ mitigation, the QAT driver must
still be configured using sysctl to request an override of the
‘History Buffer’ mitigation if desired. The default QAT driver
configuration option sysctl dev.qat.X.disable_safe_dc_mode does not
allow override of the mitigation. The new sysctl attribute
disable_safe_dc_mode is to be set to 1 for overriding the history
buffer mitigation. Firmware for qat_4xxx is updated for this change.
If this mode is enabled, decompression throughput increases but may
result in a data leak if num_user_processes is more than 1.
This option is to be enabled only if your system is not prone to
user data leaks.
Reviewed by: markj, ziaee
MFC after: 2 weeks
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D50379
(cherry picked from commit 5a8e5215cef0dac1115853889e925099f61bb5fa)
The justification is the same as in commit
fb876eef219e ("e1000: Fix some issues in em_newitr()").
Reviewed by: kbowling
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50548
(cherry picked from commit ef062029ceffacb6bde3a5639a2bd8c4d59ca1df)
The justification is the same as in commit
a5b5220b1807 ("e1000: Initialize helper variables in em_newitr()").
Reviewed by: kbowling
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50547
(cherry picked from commit d6a9f49185797c6b67e517a3d83ef63436c8d4f3)
- Load packet and byte counters exactly once, as they can be
concurrently mutated.
- Rename bytes_packets to bytes_per_packet, which seems clearer.
- Use local variables that have the same types as the counter values,
rather than truncating unsigned long to u32.
Reviewed by: kbowling
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50416
(cherry picked from commit 731c145612dd6ffe457a562959a5c027acf13334)
Due to races with the threaded transmit and receive paths, it's possible
to have r/tx_bytes != 0 && r/tx_packets == 0, in which case the maximum
byte count could be left uninitialized. Initialize them to zero to
handle this case.
PR: 286819
Reviewed by: kbowling
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50416
(cherry picked from commit e0267657f3965a56d877075fe3d4d41b8afb2faf)
Device ID for E830-XXV adapters was changed from 12D3
to 12DE. Update driver accordingly and bump version
number.
Also remove subdevice id for E830-XXV-4 for OCP 3.0,
which was cancelled.
Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Approved by: kbowling (mentor), erj (mentor)
Tested by: Gowthamkumar K S <gowtham.kumar.ks@intel.com>
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D50327
(cherry picked from commit 0fed8828c95a9d2cbcb43147ff851ca6f2c21d0f)
Some implementations of the virtio 9p transport are implemented on
virtio_mmio, e.g. the Arm FVP. Use the correct macro so the driver
attaches when this is the case.
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D49600
If, when submitting a request, the virtqueue is full, we sleep until an
interrupt has fired, then restart the request. However, while sleeping
the channel lock is dropped, and in the meantime another thread may have
reset the per-channel SG list, so upon retrying we'd (re)submit whatever
happened to be left over in the previous request.
Fix the problem by rebuilding the SG list after sleeping.
Sponsored by: Klara, Inc.
- Remove superfluous newlines.
- Use bool literals.
- Replace an unneeded SYSINIT with static initialization.
No functional change intended.
Sponsored by: Klara, Inc.
When the module is loaded on a system running on qemu/kvm the "modern"
virtio infrastructure is used and virtio_read_device_config() will end
up calling vtpci_modern_read_dev_config(). This function cannot read
values of arbitrary sizes and will panic if the p9fs mount tag size is
not supported by it.
Use virtio_read_device_config_array() instead. It was tested on both
bhyve and qemu/kvm.
PR: 280098
Co-authored-by: Mark Peek <mp@FreeBSD.org>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1320
device_attach routines are allowed to sleep, and this routine already
has other M_WAITOK allocations.
Reported by: markj
Reviewed by: markj
Fixes: 1efd69f933b6 ("p9fs: move NULL check immediately after alloc...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45721
This is derived from swills@ fork of the Juniper virtfs with many
changes by me including bug fixes, style improvements, clearer layering
and more consistent logging. The filesystem is renamed to p9fs to better
reflect its function and to prevent possible future confusion with
virtio-fs.
Several updates and fixes from Juniper have been integrated into this
version by Val Packett and these contributions along with the original
Juniper authors are credited below.
To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The
bhyve virtio-9p device allows access from the guest to files on the host
by mapping a 'sharename' to a host path. It is possible to use p9fs as a
root filesystem by adding this to /boot/loader.conf:
vfs.root.mountfrom="p9fs:sharename"
for non-root filesystems add something like this to /etc/fstab:
sharename /mnt p9fs rw 0 0
In both examples, substitute the share name used on the bhyve command
line.
The 9P filesystem protocol relies on stateful file opens which map
protocol-level FIDs to host file descriptors. The FreeBSD vnode
interface doesn't really support this and we use heuristics to guess the
right FID to use for file operations. This can be confused by privilege
lowering and does not guarantee that the FID created for a given file
open is always used for file operations, even if the calling process is
using the file descriptor from the original open call. Improving this
would involve changes to the vnode interface which is out-of-scope for
this import.
Differential Revision: https://reviews.freebsd.org/D41844
Reviewed by: kib, emaste, dch
MFC after: 3 months
Co-authored-by: Val Packett <val@packett.cool>
Co-authored-by: Ka Ho Ng <kahon@juniper.net>
Co-authored-by: joyu <joyul@juniper.net>
Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
When a channel is closed, dsp_close() either calls vchan_destroy() on vchans,
or chn_abort()/chn_flush() on primary channels. However, the problem with this
is that, when closing a vchan, we end up not terminating the stream properly.
The call sequence we are interested in is the following:
vchan_destroy(vchan) -> chn_kill(vchan) -> chn_trigger(vchan) ->
vchan_trigger(vchan) -> chn_notify(parent)
Even though chn_notify() contains codepaths which call chn_abort(parent),
apparently we do not execute any of those codepaths in this case, so the
DMA remains unterminated, hence why we keep seeing the primary
channel(s) being interrupted even once the application has exited:
root@freebsd:~ # sndctl interrupts
dsp0.play.0.interrupts=1139
dsp0.record.0.interrupts=0
root@freebsd:~ # sndctl interrupts
dsp0.play.0.interrupts=1277
dsp0.record.0.interrupts=0
root@freebsd:~ # sndctl interrupts
dsp0.play.0.interrupts=1394
dsp0.record.0.interrupts=0
The only applications that do not have this issue are those (e.g., mpv) that
manually call ioctls which end up calling chn_abort(), like SNDCTL_DSP_HALT, to
abort the channel(s) during shutdown. For all other applications that do not
manually abort the channel(s), we can confirm that chn_abort()/chn_flush(), or
even chn_trigger(PCMTRIG_ABORT) on the parent, doesn't happen during shutdown.
root@freebsd:~ # dtrace -n 'fbt::chn_abort:entry,fbt::chn_flush:entry { printf("%s", args[0]->name); stack(); }'
dtrace: description 'fbt::chn_abort:entry,fbt::chn_flush:entry ' matched 2 probes
dtrace: buffer size lowered to 1m
^C
[...]
root@freebsd:~ # dtrace -n 'fbt::chn_trigger:entry /args[1] == -1/ { printf("%s", args[0]->name); stack(); }'
dtrace: description 'fbt::chn_trigger:entry ' matched 1 probe
dtrace: buffer size lowered to 1m
CPU ID FUNCTION:NAME
0 68037 chn_trigger:entry dsp0.virtual_play.0
sound.ko`chn_kill+0x134
sound.ko`vchan_destroy+0x94
sound.ko`dsp_close+0x39b
kernel`devfs_destroy_cdevpriv+0xab
kernel`devfs_close_f+0x63
kernel`_fdrop+0x1a
kernel`closef+0x1e3
kernel`closefp_impl+0x76
kernel`amd64_syscall+0x151
kernel`0xffffffff8103841b1
To fix this, modify dsp_close() to execute the primary channel case on both
primary and virtual channels. While what we really care about are the
chn_abort()/chn_flush() calls, it shouldn't hurt to call the rest of the
functions on the vchans as well, to avoid complicating things; they get deleted
right below, anyway.
With the patch applied:
root@freebsd:~ # dtrace -n 'fbt::chn_trigger:entry /args[1] == -1/ { printf("%s", args[0]->name); stack(); }'
dtrace: description 'fbt::chn_trigger:entry ' matched 1 probe
dtrace: buffer size lowered to 1m
CPU ID FUNCTION:NAME
1 68037 chn_trigger:entry dsp0.virtual_play.0
sound.ko`chn_flush+0x2a
sound.ko`dsp_close+0x330
kernel`devfs_destroy_cdevpriv+0xab
kernel`devfs_close_f+0x63
kernel`_fdrop+0x1a
kernel`closef+0x1e3
kernel`closefp_impl+0x76
kernel`amd64_syscall+0x151
kernel`0xffffffff8103841b
0 68037 chn_trigger:entry dsp0.play.0
sound.ko`chn_notify+0x4ce
sound.ko`vchan_trigger+0x105
sound.ko`chn_trigger+0xb4
sound.ko`chn_flush+0x2a
sound.ko`dsp_close+0x330
kernel`devfs_destroy_cdevpriv+0xab
kernel`devfs_close_f+0x63
kernel`_fdrop+0x1a
kernel`closef+0x1e3
kernel`closefp_impl+0x76
kernel`amd64_syscall+0x151
kernel`0xffffffff8103841b
Above we can see a chn_trigger(PCMTRIG_ABORT) on the parent (dsp0.play.0),
which is coming from the chn_abort() (inlined) in chn_notify():
root@freebsd:~ # dtrace -n 'kinst::chn_abort:entry { stack(); }'
dtrace: description 'kinst::chn_abort:entry ' matched 5 probes
dtrace: buffer size lowered to 1m
CPU ID FUNCTION:NAME
1 72580 chn_notify:1192
sound.ko`0xffffffff8296cab4
sound.ko`vchan_trigger+0x105
sound.ko`chn_trigger+0xb4
sound.ko`chn_flush+0x2a
sound.ko`dsp_close+0x330
kernel`devfs_destroy_cdevpriv+0xab
kernel`devfs_close_f+0x63
kernel`_fdrop+0x1a
kernel`closef+0x1e3
kernel`closefp_impl+0x76
kernel`amd64_syscall+0x151
kernel`0xffffffff8103841b
We can also confirm the primary channel(s) are not interrupted anymore:
root@freebsd:/mnt/src # sndctl interrupts
dsp0.play.0.interrupts=0
dsp0.record.0.interrupts=0
In collaboration with: adrian
Tested by: adrian, christos, thj
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: thj, adrian, emaste
Differential Revision: https://reviews.freebsd.org/D50488
(cherry picked from commit f6430bc61df78be070209d52b4452ae9cf4cd015)
(cherry picked from commit 0c6aa445ec0c85e7c9653d20562907742569de6f)
Approved by: re (cperciva)
Power down the device on shutdown similar to what is done in the case
of suspend. The device may fail to attach on next boot without this.
PR: 286385
Reviewed by: christos, adrian
Differential Revision: https://reviews.freebsd.org/D50306
(cherry picked from commit d9900b9ea2b27f7a0c2eda97841b9499e02e3ea7)
(cherry picked from commit 77521692f4c71213c5419268657e696532c28325)
Approved by: re (cperciva)
Changes since 2.8.0:
Bug Fixes:
* Fix LLQ normal width misconfiguration
* Check for errors when detaching children first, not last
Minor Changes:
* Remove \n from sysctl description
Approved by: cperciva (mentor)
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D50041
(cherry picked from commit 59b30c1a864ee8a22c2e9912301cb88674f714c9)
Patch 0a33c047a443 introduced new values to
hw.ena.force_large_llq_header. The default value of 2 means no
preference, while 0 and 1 act as the previous false and true
respectively, which allowed forcefully setting regular or large LLQ.
There are 2 ways to force the driver to select regular LLQ:
1. Setting hw.ena.force_large_llq_header = 0 via sysctl.
2. Turning on ena express, which makes the recommendation by the FW to
be regular LLQ.
When the device supports large LLQ but the driver is forced to
regular LLQ, llq_config->llq_ring_entry_size_value is never initialized
and since it is a variable allocated on the stack, it stays garbage.
Since this variable is involved in calculating max_entries_in_tx_burst,
it could cause the maximum burst size to be zero. This causes the driver
to ignore the real maximum burst size of the device, leading to driver
resets in devices that have a maximum burst size (Nitro v4 and on. see
[1] for more information).
In case the garbage value is 0, the calculation of
max_entries_in_tx_burst divides by 0 and causes kernel panic.
The patch modifies the logic to take into account all use-cases and
ensure that the relevant fields are properly initialized.
[1]: https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-nitro-instances.html
Fixes: 0a33c047a443 ("ena: Support LLQ entry size recommendation from device")
Approved by: cperciva (mentor)
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D50040
(cherry picked from commit 56c45700f2ae15755358f2da8266247613c564df)
qThis change was made after feedback from upstream, aiming to align with
the style guide for consistent log formatting. No functional changes
were made to the driver, only the formatting of the log messages.
Reviewed by: ssaxena, imp
Differential Revision: https://reviews.freebsd.org/D49799
(cherry picked from commit 4494ea5406f79a6cb2d3631a723eb286ca96a9b9)