This is a retread of https://reviews.freebsd.org/D34449 which I think
will fix the issue for the remote side not supporting autoneg. We now
attempt an autoneg, and if that fails fall back to the current code
that forces the link speed/duplex.
The original intent of this patch is to inform the remote switch of
duplex settings when we (the client) are specifying a fixed 10 or 100
speed. Otherwise it may get the duplex setting wrong.
The tricky case is when the remote (switch) side is fixing its
speed AND duplex while disabling autoneg and we (client) need to do
the same, which still seems to be common enough at some ISPs.
Original commit message follows:
Currently if an e1000 interface is set to a fixed media configuration,
for gigabit, it will participate in auto-negotiation as required by
IEEE 802.3-2018 Clause 37. However, if set to fixed media configuration
for 100 or 10, it does NOT participate in auto-negotiation.
By my reading of Clauses 28 and 37, while auto-negotiation is optional
for 100 and 10, it is not prohibited and is, in fact, "highly
recommended".
This patch enables auto-negotiation for fixed 100 and 10 media
configuration, in a similar manner to that already performed for 1000.
I.e., the patch enables advertising of just the manually configured
settings with the goal of allowing the remote end to match the manually
configured settings if it has them available.
To be clear, this patch does NOT allow an em(4) interface that has been
manually configured with specific media settings to respond to
auto-negotiation by then configuring different parameters to those that
were manually configured. The intent of this patch is to fully comply
with the requirements of Clause 37, but for 100 and 10.
The need for this has arisen on an em(4) link where the other end is
under a different administrative control and is set to full
auto-negotiation. Due to the cable length GigE is not working well. It
is desired to set the em(4) end to "media 100baseTX mediatype
full-duplex" which does work when both ends are configured that way.
Currently, because em(4) does not participate in autoneg for this
setting, the remote defaults to half-duplex - i.e., there's a duplex
mismatch and things don't work. With this patch, em(4) would inform the
remote that it has only 100baseTX full, the remote would match that and
it will work.
Tested by: Natalino Picone <natalino.picone@nozominetworks.com>
Tested by: Franco Fichtner <franco@opnsense.org>
Tested by: J.R. Oldroyd <fbsd@opal.com> (previous version)
Sponsored by: Nozomi Networks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D47336
(cherry picked from commit bceec3d80a3caf9249e24247fb937674bf5b46b5)
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to change the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
Reviewed by: rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44259
(cherry picked from commit 90853dfac851afa9e8840f5a38383256d75458b6)
* Adds support for SFPs that are not correctly coded as an SFP
transceiver. i.e. Coherent-Finisar FCLF8522P2BTL.
* Configures multi-rate SFPs i.e. Coherent-Finisar FCLF8522P2BTL as
SGMII so they can do 10/100/1000 auto-negotiation.
* Adds support for 100BaseLX SGMII transceivers.
* Some code cleanup and additional debugging.
Reviewed by: emaste, markj, Franco Fichtner <franco@opnsense.org>
Tested by: Natalino Picone <natalino.picone@nozominetworks.com>
Sponsored by: Nozomi Networks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D47337
(cherry picked from commit 15853a5fc9548d9805a2ef22f24e2eb580198341)
This roughly corresponds to Linux 9d9e5347b035412daa844f884b94a05bac94f864
For FreeBSD this is currently not expected to cause any difference in
behavior because we do not have the MTP force smbus changes for modern
standby.
Sponsored by: BBOX.io
(cherry picked from commit 77b70ad751dfe3b115112252a5b96f504fcc2e0a)
Drop variable names of function prototypes since the file is mixed in
listing them or not and they fall out of sync.
Sponsored by: BBOX.io
(cherry picked from commit 9dc452b983f042aa9d2a2562517d44c1928ff42a)
We originally left this out because iflib modulates interrupts and
accomplishes some level of batching versus the custom queues in the
older driver. Upon more detailed study of the Linux driver which has a
newer implementation, it finally became clear to me this is actually a
holdoff timer and not an interrupt limit as it is conventionally
(statically) programmed and displayed as an interrupt rate. The data
sheets also make this somewhat clear.
Thus, AIM accomplishes two beneficial things for a wide variety of
workloads[1]:
1. At low throughput/packet rates, it will significantly lower latency
(by counter-intuitively "increasing" the interrupt rate.. better
thought of as decreasing the holdoff timer because you will modulate
down before coming anywhere near these interrupt rates).
2. At bulk data rates, it is tuned to achieve a lower interrupt rate
(by increasing the holdoff timer) than the current static 8000/s. This
decreases processing overhead and yields more headroom for other work
such as packet filters or userland.
For a single NIC this might be worth a few sys% on common CPUs, but may
be meaningful when multiplied such as if_lagg, if_bridge and forwarding
setups.
The AIM algorithm was re-introduced from the older igb or out of tree
driver, and then modernized with permission to use Intel code from other
drivers.
I have retroactively added it to lem(4) and em(4) where the same concept
applies, albeit to a single ITR register.
[1]: http://iommu.com/datasheets/ethernet/controllers-nics/intel/e1000/gbe-controllers-interrupt-moderation-appl-note.pdf
Tested by: cc (https://wiki.freebsd.org/chengcui/testD46768)
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D46768
(cherry picked from commit 3e501ef896671cb190e8c40c6258b8f27d136f07)
Provide macros to derive the various needed values and make it a bit
more clear the differences between em and igb.
The igb default EITR was not landing at the right offset.
Respect the 'max_interrupt_rate' tunable.
Sponsored by: BBOX.io
(cherry picked from commit 9bf9164fc8aad1ca828c725413e06462aef1e4dd)
The absolute and packet timers only apply to lem and em with some only
applying to the later.
This cleans up the sysctl tree to only show these where applicable and
stops writing to unexpected registers for igb.
Sponsored by: BBOX.io
(cherry picked from commit 1c578f1c93112d3d53820f8b6b411488b83f0ab2)
Based on sysinit_sub_id, SI_SUB_CLOCKS is after SI_SUB_CONFIGURE.
SI_SUB_CONFIGURE = 0x3800000, /* Configure devices */
At this stage, the variable “cold” will be set to 0.
SI_SUB_CLOCKS = 0x4800000, /* real-time and stat clocks*/
At this stage, the clock configuration will be done, and the real-time
clock can be used.
In the e1000 driver, if the API safe_pause_* are called between
SI_SUB_CONFIGURE and SI_SUB_CLOCKS stages, it will choose the wrong
clock source. The API safe_pause_* uses “cold” the value of which is
updated in SI_SUB_CONFIGURE, to decide if the real-time clock source is
ready. However, the real-time clock is not ready til the SI_SUB_CLOCKS
routines are done.
Obtained from: Juniper Networks
Differential Revision: https://reviews.freebsd.org/D42920
(cherry picked from commit 930a1e6f3d2dd629774f1b48b1acf7ba482ab659)
The write is only used to toggle the debug print function and this is
otherwise stateless.
(cherry picked from commit 5f6964d9fbf663f85ee60dae7dfff153b82759d8)
Bump to the current out of tree driver version since we only have some
gratuitous changes.
(cherry picked from commit ddfec1fb6814088abc5805f45c4a18c5731d51b9)
DPDK commit message
net/e1000/base: fix link power down
Current code is a result of work to reduce duplication between various
device models. However, the logic that was replaced did not exactly
match the new logic, and as a result the link power down was not
working correctly for some NICs, and the link remained up even when
the interface is down.
Fix it to correctly power down the link under all circumstances that
were supported by old logic.
Fixes: 44dddd1 ("net/e1000/base: remove duplicated codes")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Obtained from: DPDK (a8218d0)
(cherry picked from commit 811912c46b5886f1aa3bb7a51a6ec1270bc947a8)
The global hw.em.rx_process_limit knob has been replaced by the device-
specific dev.IF.N.iflib.rx_budget along with the conversion to iflib(4).
While at it, remove the - besides initialization of tx_process_limit -
unused {r,t}x_process_limit members.
(cherry picked from commit 0d6d28ce5650d1cd23dbe4bbac87fb103b3e2d3d)
In rS360398, a new iflib device method was added to opt out of VLAN
events needing an interface reset.
I am switching the default to not requiring a restart for:
* VLAN events
* unknown events
After fixing various bugs, I do not think this would be a common need
of hardware and it is undesirable from the user's perspective causing
link flaps and much slower VLAN configuration. Currently, there are no
other restart events besides VLAN events, and setting the
ifdi_needs_restart default to false will alleviate the need to churn
every driver if an odd event is added in the future for specific
hardware.
markj points out this could cause churn in the other direction; I will
solve that problem with an event registration system as he mentions in
the review should we need it in the future.
These drivers will opt into restart and need further inspection or work:
* ixv (needs code audit, 61a8231 fixed principal issue; re-init probably
not necessary)
* axgbe (needs code audit; re-init probably not necessary)
* iavf - (needs code audit; interaction with Malicious Driver Detection
mentioned in rS360398)
* mgb - no VLAN functions are currently implemented. Left a comment.
MFC after: 2 weeks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D41558
Since d49e83eac3, iflib(9) is ready
for this change.
While at it, make isc_driver_version strings (static) const where
not apparently un-const on purpose, too.
This reduces the size of the amd64 GENERIC by about 10 KiB.
This has been off by one in the FreeBSD drivers as far back as I've
looked. Emperically HW and SW emulations I have available don't seem to
mind. Noticed while debugging other issues.
MFC after: 3 days
Disable TSO on lem(4) and em(4) until a ring stall can be debugged.
I am not able to reproduce the issue on lem(4) but disabling there in
abundance of caution in case the issue is not specific to em(4).
Reported by: grog
Most em(4) devices now enjoy TSO and TSO6, matching NetBSD and Linux
defaults.
A prior commit automasks TSO on 10/100 Ethernet due to errata and other
bugs for IPv6 were fixed recently allowing this.
Mike Karels identified a performance anomaly on Intel 82574L devices.
These are multiqueue enabled on FreeBSD since the conversion to
iflib. I am investigating whether this can be fixed, in the mean time
MSI-X with checksum offloads remain default.
i219 SPT devices have an errata that downclocks the DMA engine, which
results in TSO not being able to acheive line rate. Therefore, it is
disabled on:
* Intel(R) I219-LM and I219-V SPT
* Intel(R) I219-LM and I219-V SPT-H (2)
* Intel(R) I219-LM and I219-V LBG (3)
* Intel(R) I219-LM and I219-V SPT (4)
* Intel(R) I219-LM and I219-V SPT (5)
Many lem(4) devices enjoy TSO, exceptions being 82542, 82543, 82547.
TSO6 may be possible for some chipsets but I am still working through
my testing matrix and that is hidden behind hw.em.unsupported_tso.
If you encounter issues, you may disable TSO with for example:
ifconfig em0 -tso -tso6.
I ask to be informed of any deviations from normal operation requiring
this.
Thanks to cc@ for access to emulab.net.
On a sample I219 system it saves about 16% CPU on IPv4 and 19% on IPv6.
iperf3 -Vc reported numbers:
total% user% system%
IPv4 TSO
21.3 7 14.4
21.4 6 15.4
21.5 6 15.5
IPv4 no TSO
36.8 5.4 31.4
38.5 5.1 33.5
38.2 5.7 32.6
IPv4 no TSO no TXCSUM
45.1 5.8 39.3
46 6.3 39.7
46.2 5.9 40.4
IPv6 TSO6
21.7 5.4 16.3
21.6 5.1 16.5
21.9 5.6 16.3
IPv6 no TSO6
41.2 5.2 36
41 5.1 36
40.8 5.2 35.7
IPv6 no TSO6 no TXCSUM6
49 5.9 43.1
48.8 4.9 43.9
49 5.6 43.4
Tested by: cc (lem(4)), karels (82574L)
MFC after: 3 months
Relnotes: yes
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D41170
This feature masks TSO capability when a link comes up at 10 or 100mbit
due to errata on the chips. This behavior matches previous versions of
FreeBSD as well as NetBSD and Linux.
A tunable, hw.em.unsupported_tso may be set if the admin desires to
disabling automasking and configure TSO settings manually.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41170
Also disable IPV6 checksum offload.
Spell hw->mac.type < e1000_82543 as e1000_82542. Confusingly, chips
like 82540 and 82541 come later and do not have these issues. There
is no functional change here, as the enum was defined in such a way
it worked correctly. But this reads literally.
MFC after: 1 week
Explicitly set ipcss/ipcse/ipcso for IPv6 per intel SDM as indicated in
inline comments.
Fix and consolidate 82543/82547 hwcsum exemption.
While here rearrange and expand some commentary.
* em(4) obey administrative ifcaps for using hwcsum offload
* em(4) obey administrative ifcaps for hw vlan receive tagging
* em(4) add additional TSO6 ifcap, but disabled by default as is TSO4
* lem(4) obey administrative ifcaps for using hwcsum offload
* lem(4) add support for hw vlan receive tagging
* lem(4) Add ifcaps for TSO offload experimentation, but disabled by
default due to errata and possibly missing txrx code.
* lem(4) disable HWCSUM ifcaps by default on 82547 due to errata around
full duplex links. It may still be administratively enabled.
Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30072
* em(4) obey administrative ifcaps for using hwcsum offload
* em(4) obey administrative ifcaps for hw vlan receive tagging
* em(4) add additional TSO6 ifcap, but disabled by default as is TSO4
* lem(4) obey administrative ifcaps for using hwcsum offload
* lem(4) add support for hw vlan receive tagging
* lem(4) Add ifcaps for TSO offload experimentation, but disabled by
default due to errata and possibly missing txrx code.
* lem(4) disable HWCSUM ifcaps by default on 82547 due to errata around
full duplex links. It may still be administratively enabled.
Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30072
Always set TXD_CMD_IP for 82544
Otherwise set TXD_CMD_IP for IPv4, not IPv6
Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30072
VLAN 0 essentially means "Treat as untagged, but with priority bits",
and is used by some ISPs.
On igb/em interfaces we did not receive packets with VLAN tag 0 unless
vlanhwfilter was disabled.
This can be fixed by explicitly listing VLAN 0 in the hardware VLAN
filter (VFTA). Do this from em_setup_vlan_hw_support(), where we already
(re-)write the VFTA.
Reviewed by: kbowling
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40046
Ungate DMA clock on TGP and later to avoid packet loss.
A similar fix appears in Linux 639e298f432fb058a9496ea16863f53b1ce935fe
This may be needed as far back as SPT but no confirmation from intel or
other OS yet.
Obtained from: OpenBSD (if_em_hw.c 1.116)
MFC after: 2 weeks
Sponsored by: BBOX.io
This call only makes sense for ich8lan, and the shared code does it in
e1000_setup_init_funcs() above this deletion.
Obtained from: DPDK
MFC after: 2 weeks
Sponsored by: BBOX.io
Pull Request: https://github.com/freebsd/freebsd-src/pull/539
Incrementing these to avoid confusion in users; we are on par with these
out of tree versions.
Reviewed by: erj
MFC after: 2 weeks
Sponsored by: BBOX.io
Pull Request: https://github.com/freebsd/freebsd-src/pull/540
Clear the rings before reset to avoid a HW hang.
Inspired by em-7.7.8 and DPDK (1fc9701238edcf0541289b9ae15565b6d9d7ab30)
Reviewed by: erj
MFC after: 2 weeks
Sponsored by: BBOX.io
Pull Request: https://github.com/freebsd/freebsd-src/pull/540
These include I219 (20) through I219 (23), which ends at Raptor Lake.
This also corrects a discrepancy where the (16) devices should be
mac type "e1000_pch_tgp" and not "e1000_pch_adp".
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
PR: 269224
Reviewed by: erj@
MFC after: 1 day
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D38376