0480dccd3f tried to fix the MTU for software VLANs given dpni
announces IFCAP_VLAN_MTU. Unfortunately the initial MRU during
setup is reduced from the maximum supported by the HW to our
maximum ethernet RX frame length so only after further mtu toggles
the solution there would work.
Set the maximum RX frame size (without CRC) to jumbo length +
vlan encap len by default given we also announce IFCAP_JUMBO_MTU.
While here improve the manual (ioctl) MTU setting by checking if
IFCAP_VLAN_MTU is currently enabled and only then add the extra
bytes.
Fixes: 0480dccd3f
MFC after: 3 days
Reviewed by: dsl
Differential Revision: https://reviews.freebsd.org/D47066
Right now flags is set to 0 before this "=" -> "|=" change, but it will
matter when the NOT_YET section above becomes effective.
MFC after: 2 weeks
Sponsored by: Amazon
Ethernet drivers should respect IFF_PROMISC rather than IFF_PPROMISC.
The latter is for user-requested promisc mode, it implies the former
but not vice versa. Some in-kernel components such as if_bridge(4) and
bpf(4) will set promisc mode for interfaces on-demand.
While here, update the debugging message to be not confusing.
This was spotted while reviewing markj@ 's work D46524.
Test from Franco shows that the interface seems to be unconditionally
initialized to promisc mode regardless of this fix. That needs further
investigation.
Reviewed by: markj, Franco Fichtner <franco@opnsense.org>
Tested by: Franco Fichtner <franco@opnsense.org>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46794
igc is derived from igb and has never had an AIM implementation. The
same algorithm from e1000 is appropriate here.
Upon more detailed study of the Linux driver which has a newer AIM
implementation, it finally became clear to me this is actually a
holdoff timer and not an interrupt limit as it is conventionally
(statically) programmed and displayed as an interrupt rate. The data
sheets also make this somewhat clear.
Thus, AIM accomplishes two beneficial things for a wide variety of
workloads[1]:
1. At low throughput/packet rates, it will significantly lower latency
(by counter-intuitively "increasing" the interrupt rate.. better
thought of as decreasing the holdoff timer because you will modulate
down before coming anywhere near these interrupt rates).
2. At bulk data rates, it is tuned to achieve a lower interrupt rate
(by increasing the holdoff timer) than the current static 8000/s. This
decreases processing overhead and yields more headroom for other work
such as packet filters or userland.
For a single NIC this might be worth a few sys% on common CPUs, but may
be meaningful when multiplied such as if_lagg, if_bridge and forwarding
setups.
The AIM algorithm was re-introduced from the older igb or out of tree
driver, and then modernized with permission to use Intel code from other
drivers.
[1]: http://iommu.com/datasheets/ethernet/controllers-nics/intel/e1000/gbe-controllers-interrupt-moderation-appl-note.pdf
MFC after: 1 week
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D47053
We originally left this out because iflib modulates interrupts and
accomplishes some level of batching versus the custom queues in the
older driver. Upon more detailed study of the Linux driver which has a
newer implementation, it finally became clear to me this is actually a
holdoff timer and not an interrupt limit as it is conventionally
(statically) programmed and displayed as an interrupt rate. The data
sheets also make this somewhat clear.
Thus, AIM accomplishes two beneficial things for a wide variety of
workloads[1]:
1. At low throughput/packet rates, it will significantly lower latency
(by counter-intuitively "increasing" the interrupt rate.. better
thought of as decreasing the holdoff timer because you will modulate
down before coming anywhere near these interrupt rates).
2. At bulk data rates, it is tuned to achieve a lower interrupt rate
(by increasing the holdoff timer) than the current static 8000/s. This
decreases processing overhead and yields more headroom for other work
such as packet filters or userland.
For a single NIC this might be worth a few sys% on common CPUs, but may
be meaningful when multiplied such as if_lagg, if_bridge and forwarding
setups.
The AIM algorithm was re-introduced from the older igb or out of tree
driver, and then modernized with permission to use Intel code from other
drivers.
I have retroactively added it to lem(4) and em(4) where the same concept
applies, albeit to a single ITR register.
[1]: http://iommu.com/datasheets/ethernet/controllers-nics/intel/e1000/gbe-controllers-interrupt-moderation-appl-note.pdf
Tested by: cc (https://wiki.freebsd.org/chengcui/testD46768)
MFC after: 1 week
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D46768
The units of the size reported in the 'sectors' xenbus node is always 512b,
regardless of the value of the 'sector-size' node. The sector offsets in
the ring requests are also always based on 512b sectors, regardless of the
'sector-size' reported in xenbus.
Fix both blkfront and blkback to assume 512b sectors in the required fields.
The blkif.h public header has been recently updated in upstream Xen repository
to fix the regressions in the specification introduced by later modifications,
and clarify the base units of xenstore and shared ring fields.
PR: 280884
Reported by: Christian Kujau
MFC after: 1 week
Sponsored by: Cloud Software Group
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D46756
When we can't set the number of I/O queues ont he admin queue, we
continue on. However, we don't create the I/O queue structures, so
having pointers (NULL) into them for sysctls makes no sense and leads to
a panic when accessed. When summing up different stats, also skip the
ioq stats when it's NULL.
Sponsored by: Netflix
adapter->flags are guarded by a synch_op, as noted in the comment in
adapter.h where the flags are defined.
Fixes: 5241b210a4 cxgbe(4): Basic infrastructure for ULDs to participate in adapter reset.
MFC after: 1 week
Sponsored by: Chelsio Communications
Thew code for PV suspend/resume support has long been removed, also remove the
copyright notice associated with it.
There are still two copyright blocks with (to my understanding) slightly
different wordings of the BSD 2 clause license. I however don't feel like
merging them due to those wording differences.
The removal of the PV suspend/resume code was done in
ed95805e90.
Sponsored by: Cloud Software Group
Reviewed by: imp
Differential revision: https://reviews.freebsd.org/D46860
Some toolstacks won't attempt the signal power actions on xenbus unless the VM
explicitly exposes support for them. FreeBSD supports all power actions, hence
signal on xenbus such support by setting the nodes to the value of "1".
Sponsored by: Cloud Software Group
Reviewed by: markj
Differential review: https://reviews.freebsd.org/D46859
Create an additional 4 channel pcm device for RME HDSP 9632 sound cards,
to support the optional AO4S-192 and AI4S-192 extension boards. For
simplicity, the <HDSP 9632 [ext]> pcm device is always present, even if
the extension boards are not installed.
Unfortunately I cannot test this with actual hardware, but I made sure
the additional channels do not affect the functionality of the HDSP 9632
as currently in src.
Reviewed by: christos, br
Differential Revision: https://reviews.freebsd.org/D46837
Fix unified pcm mode after support for the AO4S-192 and AI4S-192
extension boards was added. Adjust the man page accordingly.
Reviewed by: br
Differential Revision: https://reviews.freebsd.org/D46946
Some Fujitsu Lifebooks return an invalid _BIX object. The first element
of _BIX is a revision number, which indicates what elements will follow:
* ACPI 4.0 defined _BIX revision 0 with 20 elements.
* ACPI 6.0 introduced _BIX revision 1 with 21 elements.
The problem is that the offending Lifebooks have the a non-zero _BIX
revision, but provide 20 fields only.
The ACPICA parser chokes on this [1], but that seems to be
inconsequential. More importantly, our own battery info handling code
also verifies that for revision > 0, there are at least 21 fields - and
refuses to process the invalid _BIX. One workaround would be to
introduce special case / quirk handling for Fujitsu Lifebooks. A better
one is to relax the requirements check: If there are only 20 elements,
treat the _BIX as revision 0, no matter what revision number was
provided by the device.
Linux doesn't run into this problem by the way because it only supports
the 20 fields defined in the ACPI 4.0 spec [3]. It never looks at the
revision number or the 21st field added in ACPI 6.0.
[1] https://cgit.freebsd.org/src/tree/sys/contrib/dev/acpica/components/namespace/nsprepkg.c#n815
[2] https://cgit.freebsd.org/src/tree/sys/dev/acpica/acpi_cmbat.c#n371
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/battery.c#n418
PR: 252030
Reviewed by: imp
MFC After: 2 weeks
* Disable IFCAP_TOE automatically on all ifnets on all adapters during
unload. This is user-friendly and avoids panics due to stale ifnet
state after t4_tom is unloaded.
* Do not allow unload if tids are in use by the TOE on any adapter.
Reported by: Bimal Abraham @ Chelsio
MFC after: 1 week
Sponsored by: Chelsio Communications
This chipset suffered an (un)usual number of bugs and iterations. Let's
add our NVM/firmware code from e1000 and the similar igc_nvm function
from DPDK to keep track of issues.
MFC after: 1 week
Sponsored by: BBOX.io
igc, derived from igb, does not use these registers. All interrupt
timing is governed by EITR or LLI and driven by write-back.
MFC after: 1 week
Sponsored by: BBOX.io
Rather than compute ilog2(roundup_pow_of_two(x)), which invokes ilog2
twice, just use order_base_2 once. And employ that optimization
twice.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D46838
It's faster to use is_power_of_2 than it is to compute
roundup_power_of_two and then compare. So do that.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D46838
It's faster to use ispower2(n) than it is to compute
roundup_pow_of_two and do a comparison. So do the former.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D46838
Based on the definitions, ilog2(roundup_pow_of_two(x)) ==
order_base_2(x). Replace the former with the latter in a few places to
save a few calculations.
Reviewed by: bz, kib
Differential Revision: https://reviews.freebsd.org/D46827
The E1000_EITR() macro is already multiplying by 0x4 which is the same
as this shift, so we were shifting more than expected.
MFC after: 6 days
Sponsored by: BBOX.io
The pages are inserted into the added slist if the entry parameter is
passed to iommu_pgfree(). For now it is nop.
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Avoid a race condition when accessing guest memory, by reading memory
contents only once.
This has also been applied to _vq_record() in
sys/dev/beri/virtio/virtio.c, as per markj@'s suggestion.
Reported by: Synacktiv
Reviewed by: markj
Security: HYP-10
Sponsored by: The Alpha-Omega Project
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45735
Provide macros to derive the various needed values and make it a bit
more clear the differences between em and igb.
The igb default EITR was not landing at the right offset.
Respect the 'max_interrupt_rate' tunable.
MFC after: 1 week
Sponsored by: BBOX.io
The absolute and packet timers only apply to lem and em with some only
applying to the later.
This cleans up the sysctl tree to only show these where applicable and
stops writing to unexpected registers for igb.
MFC after: 1 week
Sponsored by: BBOX.io
Summary:
It can be useful to see what quirks are applied on an SDHCI slot.
Obtained from: Juniper Networks, Inc.
Reviewed By: manu
Differential Revision: https://reviews.freebsd.org/D46790
Fix the ordering of priv data creation with setting priv data. This
handles failure better and resolves a panic when repeatedly running
tools/tools/gpioevents.
Explicitly initialise more fields in priv data while we are here.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46568
nvmf_submit_request() handles races with concurrent queue pair
destruction (or the queue pair being destroyed between
nvmf_allocate_request and nvmf_submit_request), so the lock is not
needed here. This avoids holding the lock across transport-specific
logic such as queueing mbufs for PDUs to a socket buffer, etc.
Holding the lock across nvmf_allocate_request() ensures that the queue
pair pointers in the softc are still valid as shutdown attempts will
block on the lock before destroying the queue pairs.
Sponsored by: Chelsio Communications
The last reference on a pending I/O request might be held by an mbuf
in the socket buffer. When this mbuf is freed, the I/O request is
completed which triggers completion of the CCB. However, this can
occur with locks held (e.g. with so_snd locked when the mbuf is freed
by sbdrop()) raising a LOR between so_snd and the CAM device lock.
Instead, defer CCB completion processing to a thread where locks are
not held.
Sponsored by: Chelsio Communications
Based on sysinit_sub_id, SI_SUB_CLOCKS is after SI_SUB_CONFIGURE.
SI_SUB_CONFIGURE = 0x3800000, /* Configure devices */
At this stage, the variable “cold” will be set to 0.
SI_SUB_CLOCKS = 0x4800000, /* real-time and stat clocks*/
At this stage, the clock configuration will be done, and the real-time
clock can be used.
In the e1000 driver, if the API safe_pause_* are called between
SI_SUB_CONFIGURE and SI_SUB_CLOCKS stages, it will choose the wrong
clock source. The API safe_pause_* uses “cold” the value of which is
updated in SI_SUB_CONFIGURE, to decide if the real-time clock source is
ready. However, the real-time clock is not ready til the SI_SUB_CLOCKS
routines are done.
Obtained from: Juniper Networks
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42920
Tracers have to be recreated after a restart but that's okay given that
they are used for debugging only.
MFC after: 1 week
Sponsored by: Chelsio Communications
In order to better integrate modern LinuxKPI USB this tries to reduce
a contention point of "LIST". Given there is no need to use a LIST here
change it to SLIST to avoid conflicts.
It is a workaround which does not solve the actual problem (overlapping
namespaces) but it helps us a lot for now.
Sponsored by: The FreeBSD Foundation
X-MFC? unclear
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D46534
Some block devices may request datamove operations from an ithread
context while holding locks. Queue datamove operations to a taskqueue
backed by a thread pool to safely permit blocking allocations, etc. in
datamove handling.
Reviewed by: asomers
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D46551