Formally, there are 12 bits for TCP header flags.
Use the accessor functions in more (kernel) places.
No functional change.
Reviewed By: cc, #transport, cy, glebius, #iflib, kbowling
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47063
This implementation had various bugs. bde@ reported that the unit
conversion/scaling is wrong, and it also does not handle 82574L or
igb(4) devices correctly.
With the new AIM code, it is expected most users will not need to
manually tune this.
If you do need static control:
hw.em.enable_aim=0 for all interfaces at boot or dev.em.X.enable_aim=0
for individual interfaces at runtime and they will track the
hw.em.max_interrupt_rate tunable. That codepath has been bugfixed for
all supported chipsets.
You may view the current rate with dev.em.X.queue_rx_0.interrupt_rate
which has been bugfixed for all supported chipsets.
If you need to set different rates per interface for some reason let me
know and I will rethink how to add this back. Otherwise you can leave
AIM on for general purpose interfaces and disable it at runtime on
special purpose low or high latency interfaces that would track
hw.em.max_interrupt_rate if you have a mix of concerns.
PR: 235031
Reported by: Bruce Evans <bde@FreeBSD.org>
MFC after: 3 days
Relnotes: yes
Sponsored by: BBOX.io
A long cable attached to the UART can act as an antenna if disconnected
from the other end. This can cause noise on the receive side, possibly
as reflections from the transmit side, leading to an interrupt storm.
Filter this by adding a threshold of received characters without TX
ready, above which characters are dropped. This is disabled by default,
but has been tested with a threshold of 1000+. A high threshold is
recommended to avoid dropping characters during, for instance, a large
copy/paste from the other end.
Sponsored by: Juniper Networks, Inc.
Specialize acpi bus_get_domain method to read ivar.
Execute and cache the _PXM result in the ivar at namespace enumeration
time.
If there is no _PXM, driver for the child can set the ivar to the value
obtained by other means.
Move acpi_get_domain() to acpi_pci.c, it now serves pci buses and
devices on them.
Suggested and reviewed by: jhb
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D47291
The KASSERT in chn_sleep() can be triggered if more than one thread
wants to sleep on a given channel at the same time. While this is not
really a common scenario, tools such as stress2, which use fork() and
the child process(es) inherit the parent's FDs as a result, we can end
up triggering such scenarios.
Fix this by removing CHN_F_SLEEPING altogether, which is not very useful
in the first place:
- CHN_BROADCAST() checks cv_waiters already, so there is no need to
check CHN_F_SLEEPING as well.
- We can check whether cv_waiters is 0 in pcm_killchans(), instead of
whether CHN_F_SLEEPING is not set.
Reported by: dougm, pho (stress2)
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch, markj
Differential Revision: https://reviews.freebsd.org/D47559
Since SD_F_REGISTERED is cleared at the same time SD_F_DETACHING and
SD_F_DYING are set, and since PCM_DETACHING() is always used in
conjuction with PCM_REGISTERED()/DSP_REGISTERED(), it is enough to just
check SD_F_REGISTERED.
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch, markj
Differential Revision: https://reviews.freebsd.org/D47463
This patch fixes multiple different panic scenarios occuring during
hot-unload:
1. The channel is unlocked in chn_read()/chn_write() for uiomove(9) and
in the meantime we enter pcm_killchans() and free it. By the time we
have returned from userland and try to lock it back, the channel will
have been freed.
2. The parent channel has been freed in pcm_killchans(), but at the same
time, some yet-unstopped vchan's chn_read()/chn_write() calls
chn_start(), which eventually calls vchan_trigger(), which references
the freed parent.
3. PCM_WAIT() panics because it references a freed PCM lock.
For scenarios 1 and 2, refactor pcm_killchans() to first make sure all
channels have been stopped, and then proceed to free them one by one, as
opposed to freeing the first free channel until all channels have been
freed. This change makes the code more robust, but might introduce some
performance overhead when many channels are allocated, since we
continuously loop through the channel list until all of them are
stopped, and then we loop one last time to free them.
For scenario 3, restructure the code so that we can use destroy_dev(9)
instead of destroy_dev_sched(9) in dsp_destroy_dev(). Because
destroy_dev(9) blocks until all references to the device have went away,
we ensure that the PCM cv and lock will be freed safely.
While here, move the delete_unrhdr(9) calls to pcm_killchans() and
re-order some lines.
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch
Differential Revision: https://reviews.freebsd.org/D47462
Consider the following scenario:
1. CHN currently has its trigger set to PCMTRIG_STOP.
2. Thread A locks CHN, calls CHANNEL_TRIGGER(PCMTRIG_START), sets the
trigger to PCMTRIG_START and unlocks.
3. Thread B picks up the lock, calls CHANNEL_TRIGGER(PCMTRIG_ABORT) and
returns a non-zero value, so it returns from chn_trigger() as well.
4. Thread A picks up the lock and adds CHN to the list, which is
_wrong_, because the last call to CHANNEL_TRIGGER() was with
PCMTRIG_ABORT, meaning the channel is stopped, yet we are adding it
to the list and marking it as started.
Another problematic scenario:
1. Thread A locks CHN, sets the trigger to PCMTRIG_ABORT, and unlocks
CHN. It then locks PCM and _removes_ CHN from the list.
2. In the meantime, since thread A unlocked CHN, thread B has locked it,
set the trigger to PCMTRIG_START, unlocked it, and is now blocking on
PCM held by thread A.
3. At the same time, thread C locks CHN, sets the trigger back to
PCMTRIG_ABORT, unlocks CHN, and is also blocking on PCM. However,
once thread A unlocks PCM, because thread C is higher-priority than
thread B, it picks up the PCM lock instead of thread B, and because
CHN is already removed from the list, and thread B hasn't added it
back yet, we take a page fault in CHN_REMOVE() by trying to remove a
non-existent element.
To fix the former scenario, set the channel trigger before the call to
CHANNEL_TRIGGER() (could also come after, doesn't really matter) and
check if anything changed one we lock CHN back.
To fix the latter scenario, use the SAFE variants of CHN_INSERT_HEAD()
and CHN_REMOVE(). A similar scenario can occur in vchan_trigger(), so do
the trigger setting after we've locked the parent channel.
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch
Differential Revision: https://reviews.freebsd.org/D47461
Use callout_init_mtx(9) to associate the callback with the driver's
lock. Also make sure the callout is stopped properly during detach.
While here, introduce a dummy_active() function to know when it's
appropriate to stop or not reschedule the callout.
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch, markj
Differential Revision: https://reviews.freebsd.org/D47459
The current quirk is designed to discard duplicated data read from
the chip. Problem is, it also discards real events when they happen
to be identical, which is the case with scroll wheel events;
differently from X/Y they always move by fixed offset. This results
in two-finger scroll that would stop mid-way that could be fixed by
manually setting dev.hms.0.drift_thresh to 0.
To fix that, don't discard duplicates when there's wheel movement.
For users with actual duplicates problem this will result in scroll
suddenly becoming quite inertial, but it will stop moving at any touch,
so shouldn't be terrible.
PR: kern/276709
Reviewed By: wulf
Differential Revision: https://reviews.freebsd.org/D47640
This is a retread of https://reviews.freebsd.org/D34449 which I think
will fix the issue for the remote side not supporting autoneg. We now
attempt an autoneg, and if that fails fall back to the current code
that forces the link speed/duplex.
The original intent of this patch is to inform the remote switch of
duplex settings when we (the client) are specifying a fixed 10 or 100
speed. Otherwise it may get the duplex setting wrong.
The tricky case is when the remote (switch) side is fixing its
speed AND duplex while disabling autoneg and we (client) need to do
the same, which still seems to be common enough at some ISPs.
Original commit message follows:
Currently if an e1000 interface is set to a fixed media configuration,
for gigabit, it will participate in auto-negotiation as required by
IEEE 802.3-2018 Clause 37. However, if set to fixed media configuration
for 100 or 10, it does NOT participate in auto-negotiation.
By my reading of Clauses 28 and 37, while auto-negotiation is optional
for 100 and 10, it is not prohibited and is, in fact, "highly
recommended".
This patch enables auto-negotiation for fixed 100 and 10 media
configuration, in a similar manner to that already performed for 1000.
I.e., the patch enables advertising of just the manually configured
settings with the goal of allowing the remote end to match the manually
configured settings if it has them available.
To be clear, this patch does NOT allow an em(4) interface that has been
manually configured with specific media settings to respond to
auto-negotiation by then configuring different parameters to those that
were manually configured. The intent of this patch is to fully comply
with the requirements of Clause 37, but for 100 and 10.
The need for this has arisen on an em(4) link where the other end is
under a different administrative control and is set to full
auto-negotiation. Due to the cable length GigE is not working well. It
is desired to set the em(4) end to "media 100baseTX mediatype
full-duplex" which does work when both ends are configured that way.
Currently, because em(4) does not participate in autoneg for this
setting, the remote defaults to half-duplex - i.e., there's a duplex
mismatch and things don't work. With this patch, em(4) would inform the
remote that it has only 100baseTX full, the remote would match that and
it will work.
Tested by: Natalino Picone <natalino.picone@nozominetworks.com>
Tested by: Franco Fichtner <franco@opnsense.org>
Tested by: J.R. Oldroyd <fbsd@opal.com> (previous version)
Sponsored by: Nozomi Networks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D47336
On a laptop with no other console devices than the screen, things
scroll of the screen faster than eye or camera can capture it.
This tunable slows the console down and makes it update synchronously,
so console output continues when timers or interrupts do not.
Differential Revision: https://reviews.freebsd.org/D47710
Remove the array of port module status and instead save module status
and module number.
At boot, for each PCI function driver get event from fw about module
status. The event contains module number and module status. Driver
stores module number and module status.. When user (ifconfig) ask for
modules information, for each pci function driver first queries fw to
get module number of current pci function, then driver compares the
module number to the module number it stored before and if it matches
and module status is "plugged and enabled" then driver queries fw for
the eprom information of that module number and return it to the
caller.
In fact fw could have concluded that required module number of the
current pci function, but fw is not implemented this way. current
design of PRM/FW is that MCIA register handling is only aware of
modules, not the pci function->module connections. FW is designed to
take the module number written to MCIA and write/read the content
to/from the associated module's EPROM.
So, based on current FW design, we must supply the module num so fw
can find the corresponding I2C interface of the module to write/read.
Sponsored by: NVidia networking
MFC after: 1 week
Ensure all allocated tags have a hardware context associated.
The hardware context allocation is moved into the zone import
routine, as suggested by kib. This is safe because these zone
allocations are always done in a sleepable context.
I have removed the now pointless num_resources tracking,
and added sysctls / tunables to control UMA zone limits
for these tls tags, as well as a tunable to let the
driver pre-allocate tags at boot.
MFC after: 2 weeks
Change cdev_mgtdev_page_free_page to take an iterator, rather than an
object and page, so that removing the page from the object radix tree
can take advantage of locality with iterators. Define a
general-purpose function to free all pages, which can be used in
several places.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D47692
embedfs.S needs the right aarch64 features for BTI and/or PAC.
Obtained from: CheriBSD
Fixes: c2e0d56f5e ("arm64: Support BTI checking in most of the kernel")
Sponsored by: AFRL, DARPA
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to change the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
MFC after: 3 days
Sponsored by: Netflix
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to change the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
Reviewed by: rrs
MFC after: 3 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44259
I found I was getting constant device timeouts when doing anything
more complicated than a single SSH on laptop with RTL8811AU.
After digging into it, i found a variety of fun situations, including
traffic stalls that would recover w/ a shorter (1 second) USB transfer
timeout. However, the big one is a straight up hang of any TX endpoint
until the NIC was reset. The RX side kept going just fine; only the
TX endpoints would hang.
Reproducing it was easy - just start up a couple of traffic streams
on different WME AC's - eg a best effort + bulk transfer, like
browsing the web and doing an ssh clone - throw in a ping -i 0.1
to your gateway, and it would very quickly hit device timeouts every
couple of seconds.
I put everything into a single TX EP and the hangs went away.
Well, mostly.
So after some MORE digging, I found that this driver isn't checking
if the transfers are going into the correct EPs for the packet
WME access category / 802.11 TID; and would frequently be able
to schedule multiple transfers into the same endpoint.
Then there's a second problem - there's an array of endpoints
used for setting up the USB device, with .endpoint = UE_ADDR_ANY,
however they're also being setup with the same endpoint configured
in multiple transfer configs. Eg, a NIC with 3 or 4 bulk TX endpoints
will configure the BK and BE endpoints with the same physical endpoint
ID. This also leads to timed out transfers.
My /guess/ was that the firmware isn't happy with one or both of the
above, and so I solved both.
* drop the USB transfer timeout to 1 second, not 5 seconds -
that way we'll either get a 1 second traffic pause and USB transfer
failure, or a 5 second device timeout. Having both the TX timeout
and the USB transfer timeout made recovery from a USB transfer
timeout (without a NIC reset) almost impossible.
* enforce one transfer per endpoint;
* separate pending/active buffer tracking per endpoint;
* each endpoint now has its own TX callback to make sure the queue /
end point ID is known;
* and only frames from a given endpoint pending queue is going
into the active queue and into that endpoint.
* Finally, create a local wme2qid array and populate it with the
endpoint mapping that ensures unique physical endpoint use.
Locally tested:
* rtl8812AU, 11n STA mode
* rtl8192EU, 11n STA mode (with diffs to fix the channel config / power
timeouts.)
Differential Revision: https://reviews.freebsd.org/D47522
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to fix the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
Michael notes emperically 82599 has an unexpected middle mask:
Chip First Middle Last
82599 0xFF6 0xFF6 0xF7F
which should be fixed up to 0xF76 (RFC 3168) in a future commit.
Reviewed by: rrs, rscheff
MFC after: 3 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44258
Once we set that we're doing the inversion workaround, there's no sense
continuing to search for the inversion workaround.
Sponsored by: Netflix
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D47686
When multiple IRQs are specified in a single resource then we only
check the first. Change this to check all interrupts for the value
we expect to find.
Without this we may still enable the interrupt, but it can have the
wrong polatiry or trigger. This can cause an interrupt storm if the
interrupt was configured with a level trigger when it should have
been an edge.
PR: 282241
Reported by: trasz
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D47487
The driver wasn't stable - it would start fine, but during scan
it would eventually hang and no further command endpoint transfers
would complete.
After adding some debugging and looking at the logs I noticed that
things went sideways once a /data/ frame was sent. The channel
change config happened between the data frame being sent and
being completed.
My guess is that the firmware doesn't like a channel change
and reset whilst there's pending data frames. Checking the Linux
driver I found that it was doing a flush before a channel change,
and we're doing it afterwards. This acts like a fence around
ensuring scheduled TX work has completed. In net80211 the
transmit path and the control path aren't serialised, so it's
very often the case that ioctls, state changes, etc occur
whilst in parallel there are frame transmits being scheduled.
This seems to happen more frequently on a more recent, high core
(8) machine with XHCI. I remember testing this driver years ago
on single and dual core CPU laptops with no problems.
So, add some flushes - before a channel change, and during
a transition to AUTH when the BSS config is being programmed into
the firmware. These two fences seem enough to reliably
associate as a 2GHz and 5GHz STA.
Note that this isn't entirely blocking all newly queued
transmit work from occuring until after the NIC has finished
configuration. That will need some further investigation.
Locally tested:
* Wistron NuWeb AR5523 dual-band NIC, STA mode, 2/5GHz
Differential Revision: https://reviews.freebsd.org/D47655