Virtual Functions have access to a limited number of registers,
and their bus space size is lower. Use KASSERT to detect out-of-bounds
access and eliminate them to avoid kernel panics in production
environment.
Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: jmg
Tested by: mateusz.moga_intel.com
Approved by: kbowling (mentor), erj (mentor)
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D52976
(cherry picked from commit 2c02e6ca7154593d214b62578f67d9fe7db23d70)
Add device IDs and branding strings for E835 adapters.
This is a follow up for E830 adapters with Security Protocol
and Data Model (SPDM) support and RDMA support available
on 100 and 200Gbps links.
Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Approved by: kbowling (mentor), erj (mentor)
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D52782
(cherry picked from commit b202176dc76d862f886778439b96dd1243d8b999)
This is part 1 of the support for the new Intel Ethernet E610 family of devices.
Introduce new PCI device IDs:
• 57AE: Intel(R) E610 (Backplane)
• 57AF: Intel(R) E610 (SFP)
• 57B0: Intel(R) E610 (10 GbE)
• 57B1: Intel(R) E610 (2.5 GbE)
• 57B2: Intel(R) E610 (SGMII)
Key updates for E610 family:
• Firmware manages Link and PHY
• Implement new CSR-based Admin Command Interface (ACI) for SW-FW interaction
• Tested exclusively for x64 operating systems on E610-XT2/XT4 (10G) and E610-IT4 (2.5G)
• Enable link speeds above 1G: 2.5G, 5G and 10G
• NVM Recovery Mode and Rollback support
Signed-off-by: Yogesh Bhosale yogesh.bhosale@intel.com
Co-developed-by: Krzysztof Galazka krzysztof.galazka@intel.com
Approved by: kbowling (mentor), erj (mentor)
Tested by: gowtham.kumar.ks_intel.com
Sponsored by: Intel Corporation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50067
(cherry picked from commit dea5f973d0c8d29a79b433283d0a2de8f4615957)
This change reapplies the improvements from commit 89e7335 and adds
additional fixes and code optimizations on top of it.
The ixl driver supports up to 128 multicast filters in hardware. When this
limit is exceeded, the driver should enable multicast promiscuous mode.
When the count drops below 128, it should disable promiscuous mode and
restore individual filters.
The driver previously had problems that could corrupt multicast filters list.
The main issue was that ixl_dis_multi_promisc() would attempt to disable
promiscuous mode without checking if it was actually enabled, potentially
corrupting existing filters. There was also no state tracking across driver
functions, leading to redundant operations.
This change adds an IXL_FLAGS_MC_PROMISC flag to track the multicast
promiscuous mode state. The flag is set when enabling promiscuous mode and
cleared when disabling it. Early return checks prevent redundant operations
when the mode is already in the desired state, avoiding filter corruption
and unnecessary hardware calls.
Signed-off-by: Yogesh Bhosale yogesh.bhosale@intel.com
PR: 283820
Approved by: kbowling (mentor)
Tested by: gowtham.kumar.ks_intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D52549
(cherry picked from commit 46a8a1f08f88c278e60ebb6daa7a551eb641c67b)
According to section 5.1.6.2.1 of version 1.3 of the virtio
specification, the driver MUST NOT set VIRTIO_NET_HDR_F_DATA_VALID in
the flags. So don't do that.
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D53650
(cherry picked from commit 836b3cd9d7910aff5225e9e58189067ca03fae30)
Transmit segment offloading depends on transmit checksum offloading.
Enforce that constraint. This also fixes a bug, since if_hwassist bits
are from the CSUM_ space, not from the IFCAP_ space.
PR: 290773
Reviewed by: Timo Völker
Tested by: lg@efficientip.com
Differential Revision: https://reviews.freebsd.org/D53629
(cherry picked from commit 4c50ac68166caf7e08c5a9984d63fa91490fa50d)
- Change vm_page_reclaim_contig[_domain] to return an errno instead
of a boolean. 0 indicates a successful reclaim, ENOMEM indicates
lack of available memory to reclaim, with any other error (currently
only ERANGE) indicating that reclamation is impossible for the
specified address range. Change all callers to only follow
up with vm_page_wait* in the ENOMEM case.
- Introduce vm_domainset_iter_ignore(), which marks the specified
domain as unavailable for further use by the iterator. Use this
function to ignore domains that can't possibly satisfy a physical
allocation request. Since WAITOK allocations run the iterators
repeatedly, this avoids the possibility of infinitely spinning
in domain iteration if no available domain can satisfy the
allocation request.
PR: 274252
Reported by: kevans
Tested by: kevans
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D42706
(cherry picked from commit 2619c5ccfe1f7889f0241916bd17d06340142b05)
MFCed as a prerequisite for further MFC of VM domainset changes. Based
on analysis, it would not hurt, and I have been using it in productions
for months now.
Resolved the trivial conflict due to commit 718d1928f874 ("LinuxKPI:
make linux_alloc_pages() honor __GFP_NORETRY") having been MFCed before
this one.
The type of variable promisc and allmulti was changed from int to bool
by commit [1].
[1] 7dce56596f Convert to if_foreach_llmaddr() KPI
MFC after: 3 days
(cherry picked from commit 80dfed11fc1c61ce9168db01dee263447619e859)
Keep the hwassist flags for transmit checksum offload and transmit
segment offload in sync with the enabled capabilities.
Reported by: Timo Völker
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D52765
(cherry picked from commit f2575d56c8c9a8acad4a61a3586546dff4febce1)
Hardware TCP LRO results in problems in settings with IP forwarding
being enabled. In case of nodes without IP forwarding, using
software LRO is also beneficial in general, since it can provide better
information about what was received on the wire.
Therefore, disable hardware TCP LRO by default.
By tuning the loader tunable, this can be changed.
PR: 263229
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D52684
(cherry picked from commit 6e4b811009d63f33c59d51f28fd4a030ca90843e)
Enable the handling of the IFCAP_RXCSUM_IPV6 handling by handling
IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6 as a pair. Also make clear, that
software and hardware LRO require receive checksum offload.
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D52682
(cherry picked from commit eaf619fddcb21859311b895a0836da3171a01531)
When ALTQ is enabled, this driver does "hardware" accounting and soft
accounting at the same time. Prefer the "hardware" one to make the logic
simpler.
Reviewed by: zlei
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D44817
(cherry picked from commit 2a346c8993cbb92a321a7c25bd9ac4dcaae352d1)
While here, advertise the IFCAP_HWSTATS capability to avoid the net
stack from double counting it.
Co-authored-by: zlei
Reviewed by: zlei
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D44816
(cherry picked from commit a14d561e58529c9686a2efc47f4828ad82026e63)
When transmitting a packet over the vtnet interface, map the
csum flags CSUM_DATA_VALID | CSUM_PSEUDO_HDR to the virtio
flag VIRTIO_NET_HDR_F_DATA_VALID.
When receiving a packet over the virtio network channel, translate
the virtio flag VIRTIO_NET_HDR_F_NEEDS_CSUM not to CSUM_DATA_VALID |
CSUM_PSEUDO_HDR, but to CSUM_TCP, CSUM_TCP_IPV6, CSUM_UDP, or
CSUM_UDP_IPV6.
The second change fixes a series of issue related to checksum
offloading for if_vtnet.
While there, improve the stats counters to allow a detailed view
on what is going on in relation to checksum offloading.
PR: 165059
Reviewed by: tuexen, manpages
Differential Revision: https://reviews.freebsd.org/D51686
(cherry picked from commit 3008f30d2c2cabdd7e17f7fb922139da8681ffbd)
Fix the aggregation of the interface level counters
* dev.vtnet.X.tx_task_rescheduled,
* dev.vtnet.X.tx_tso_offloaded,
* dev.vtnet.X.tx_csum_offloaded,
* dev.vtnet.X.rx_task_rescheduled,
* dev.vtnet.X.rx_csum_offloaded, and
* dev.vtnet.X.rx_csum_failed.
Also ensure that dev.vtnet.X.tx_defrag_failed only counts the number
of times m_defrag() fails.
While there, mark sysctl-variables used for exporting statistics as
such (CTLFLAG_STATS).
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D51999
(cherry picked from commit 03da4395158d374b5e38623f6744ce31302b530c)
Originally ixgbe_if_update_admin_status() only handled 1G and 10G speeds,
causing any other speeds to display as "1 Gbps" in link status logs.
This issue is fixed by adding link speed to string conversion logic through
the introduction of a helper function, ixgbe_link_speed_to_str(), which
corrects the misleading logs to reflect accurate link speeds.
Signed-off-by: Yogesh Bhosale yogesh.bhosale@intel.com
PR: 288960
Reported by: Mike Belanger - QNX
Differential Revision: https://reviews.freebsd.org/D52442
(cherry picked from commit 46347b3619757e3d683a87ca03efaf2ae242335f)
We can do so trivially, so make these tables read-only. No functional
change intended.
Reviewed by: cem, emaste
MFC after: 2 weeks
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D52003
(cherry picked from commit d5f55356a2fbf8222fb236fe509821e12f1ea456)
Add PNP info so it the module can be by devmatch(8) and automatically
loaded. On non-x86 platforms it is not included in GENERIC.
Reviewed by: imp
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D52557
(cherry picked from commit e13b5298ec87be03da2231bc7b44a6a4b976b850)
It may pass packets up the stack and so needs to be called in a network
epoch. When a watchdog timeout happens, we need to enter a section
explicitly.
Reviewed by: zlei, glebius, adrian
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D51885
(cherry picked from commit b653a281f5a977ba73b3d405874f8af8e8b6b50d)
When MSI or legacy interrupt is used driver controls wheter
queues can trigger an interrupt with the Interrupt Linked List.
While processing traffic first index of the list is set to EOL
value to stop queues from triggering interrupts. This index was
not reset to the correct value when driver attempted to re-enable
interrupts from queues, what prevented driver from processing any
traffic. Fix that by setting correct first index in the
ixl_if_enable_intr function.
While at that fix the comments style and make ixl_if_enable_intr
and ixl_if_disable_intr more consistent.
Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com>
PR: 288077
Suggested by: Mike Belanger <mibelanger@qnx.com>
Approved by: kbowling (mentor), erj (mentor)
Tested by: gowtham.kumar.ks_intel.com,
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D51331
(cherry picked from commit 6f41c1fc39d9fa9db989a7b4f325c3ab85b8fb45)
Summary:
These came in the original DrvAPI commits in 2014, and are obsoleted by
bpf_mtap_if() and ether_bpf_mtap_if(). The `_if` suffix, rather than
prefix, conveys that it's operating on the bpf of the interface, instead
than the interface itself.
Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D41146
(cherry picked from commit 2a3716432d209c5fef1eb1a719f4c1914e7c8b5a)
Include opt_inet.h and opt_inet6.h early in the files including
virtio_net.h, since they use INET and/or INET6.
While there, remove redundant inclusion of sys/types.h, since it is
included already by sys/param.h.
There was a discussion to include opt_inet.h and opt_inet6.h also
in virtio_net.h. glebius suggested to add a mechanism for files
to check, if required opt_*.h files were included. virtio_net.h
will be the first consumer of this mechanism.
Reviewed by: glebius, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52046
(cherry picked from commit 3077532b1bb2911d3012ee90bae9d9499c960569)
Remove an always-false check for whether the request has already
completed before sleeping. Even if the request is complete, the
response tag is updated while holding the channel lock, which is also
held here.
No functional change intended.
Sponsored by: Klara, Inc.
(cherry picked from commit 28c9b13b236d25512cfe4e1902411ff421a14b64)
Replace priorities specified by a base priority and some hardcoded
offset value by symbolic constants. Hardcoded offsets prevent changing
the difference between priorities without changing their relative
ordering, and is generally a dangerous practice since the resulting
priority may inadvertently belong to a different selection policy's
range.
Since RQ_PPQ is 4, differences of less than 4 are insignificant, so just
remove them. These small differences have not been changed for years,
so it is likely they have no real meaning (besides having no practical
effect). One can still consult the changes history to recover them if
ever needed.
No functional change (intended).
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45390
(cherry picked from commit 8ecc41918066422d6788a67251b22d11a6efeddf)
This was recently added to Linux to improve incremental update support,
as you could previously add Allowed-IPs but not remove without replacing
the whole set (and thus, potentially disrupting existing traffic).
Removal is incredibly straightforward; we'll find it in p_aips first
to ensure that it's actually valid for this peer, then we'll delete it
from the radix tree before we remove the corresponding p_aips entry.
Reviewed by: Jason A. Donenfeld, jhb
(cherry picked from commit d15d610fac97df4fefed3f14b31dcfbdcec65bf9)
(cherry picked from commit d1ac3e245f084ee0637bde9a446687621358c418)
We'll re-use these in a future wg_aip_del() to perfectly reconstruct
what we expect to find in a_addr/a_mask.
Reviewed by: ivy, markj (both earlier version), Aaron LI, jhb
(cherry picked from commit 2475a3dab0d5c5614e303c0022a834f725e2a078)
The only difference in the wg_aip_add() call after IP validation is the
address family. Just pull that out into a variable and avoid the two
different callsites for wg_aip_add(). A future change will add a new
call for each case to remove an address from the peer, so it's nice to
avoid needing to repeat the logic for two different branches.
Reviewed by: Aaron LI, Jason A. Donenfeld, ivy, jhb, markj
(cherry picked from commit ba2607ae7dff17957d9e62ccd567ba716c168e77)
This was broken in c63d67e137f3, the early returns prevent building the
media lists as expected.
The BASE-T parts of the patch were suggested by "cyric@mm.st", while I
am adding the additional 40G AOC, 1CX, autoneg and unknown PHY fixes
based on code inspection. There may be additional work left here for
Broadcom but this is certainly better than the returns.
PR: 287395
Reported by: mickael.maillot@gmail.com, cyric@mm.st
Tested by: Einar Bjarni Halldórsson <einar@isnic.is>
(cherry picked from commit 5e6e4f752833acc96f1efc893318d3f6b74b9689)
The header file might be included after linux/stddef.h or others are
included and the macros would be re-defined.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D50156
(cherry picked from commit 152e6197615570e7a2f5f1c6c2ed00ecee9dd10c)
In order to be able to use MODULE_DEVICE_TABLE() with multiple bus
attachments, factor out the bus-specfic MODULE_PNP_INFO() and place
it next to the structure defining the table.
As it turns out bnxt(4) has been using the MODULE_DEVICE_TABLE() with
PCI attachments for the "auxillary" bus so far. That makes little sense.
Define the MODULE_PNP_INFO() to nothing for that. We may consider
pulling these LinucKPI bits in semi-native drivers into LinuxKPI
one day as that route is not really sustainabke.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp, dumbbell
Differential Revision: https://reviews.freebsd.org/D51049
(cherry picked from commit 2f5666c1727c949491f73e6c3277b7b542131714)
Otherwise we can end up with a lost interrupt, causing lost request
completion wakeups and hangs in the filesystem layer.
Continue processing until we enable interrupts and then observe an empty
queue, like other virtio drivers do.
Sponsored by: Klara, Inc.
In (unknown) situations it seems the i2c bus can have trouble,
while nothing about the current link state has changed, the driver
would react by going into a link down state, and start busylooping
on up to 4 cores. Even if there was a valid link, such spinning
on a cpu by a kernel thread would wreak havoc to existing and
new connections.
This patch does the following:
1. If such a bus failure occurs, we keep the last known link state.
2. Prevent busy looping by implementing the lockmgr() facility to
be able to sleep while the i2c code waits on the i2c ISR. We cap
this with a timeout.
3. Pin the admin queues to the last CPU in the system, to prevent
other scenarios where busy looping might occur from landing on CPU
0, which especially seems to cause a lot of issues.
Given the design constraints both in hardware and in software,
the lockmgr() seems to be the only viable option, even though
FreeBSD explicitly forbids sleeping in callout context, but
fails to explain why this is or offer alternatives.
axgbe: revert allocating admin queues to last CPU
The issue was resolved in 52454a1e5b.
Scheduled threads such as CARP are now no longer pinned to CPU 0, making sure
they always get their time slice even if CPUs are blocked.
Since the I/O expander chip does not do a reset when soft power
cycling, the driver will first turn off all LEDs when initializing,
although no specific routine seems to be called when powering down.
This means that the LEDs will stay on until the driver has booted up,
after which the driver will be in a consistent state.
Initially, RSF (Receive Queue Store and Forward) was disabled for
unknown reasons, but the cut-through mode that's enabled as a result
seems to send 0 length packets up to the DMA when the RX queue is
full.
Since the iflib interface needs axgbe_pci_init() and its phy starting capabilities, no data was passed in its absence.
With the NULL check of the axgbe_miibus we also resort back to an MDIO read as a module might be capable of both
clause 22 and clause 45 methods of communication.
with the move of phy_stop() to if_detach() in d50d4e8cd4, it's better to prevent reconfiguring the phy should the pci_init() callout trigger more than once.