The previous commit to this node falsely stated that locked callouts
are compatible with netgraph ng_callout KPI. They are not. An item
can be queued instead of being applied to the node, which results in
a mutex leak to the callout thread and later unlocked call into function
that expects to be called locked.
Potentially netgraph can be taught to handle locked callouts, but that
would bring a lot of complexity in it. Instead lets question necessity
of ng_callout() instead of callout_reset(). It protects against node
going away while callout is scheduled. But a node that drains all
callouts in the shutdown method (ng_l2tp does) is already protected.
Fixes: 89042ff776
Some AMD systems I have report 8 NMI and 3591 polled error sources.
Previous code could handle only one NMI source and used separate
callout for each polled source. New code can handle multiple NMIs
and groups polled sources by power of 2 of the polling period.
MFC after: 2 weeks
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
Port 9781c28c6d and a8837c77ef to the mps driver. Before this
change devq was frozen only if some command was sent to the target after
reset started, but release was called always. This change freezes the
devq immediately, leaving mprsas_action_scsiio() check only to cover
race condition due to different lock devq use.
This should also avoid unnecessary requeue of the commands, creating
additional log noise and confusing some broken apps. It also avoids a
'busy' requeue of I/Os failing when we're doing recovery that takes
longer than the normal busy timeout. These I/Os failing can lead to
filesystems being unmounted in the force unmount case for I/O errors.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D33228
which will report where the epoch was entered and also
mark the tracker, so that exit will also be reported.
Helps to understand epoch entrance/exit scenarios in
complex cases, like network stack. As everything else
under EPOCH_TRACE it is a developer only tool.
Just trust the pcb database, that if we did in_pcbref(), no way
an inpcb can go away. And if we never put a dropped inpcb on
our queue, and tcp_discardcb() always removes an inpcb to be
dropped from the queue, then any inpcb on the queue is valid.
Now, to solve LOR between inpcb lock and HPTS queue lock do the
following trick. When we are about to process a certain time
slot, take the full queue of the head list into on stack list,
drop the HPTS lock and work on our queue. This of course opens
a race when an inpcb is being removed from the on stack queue,
which was already mentioned in comments. To address this race
introduce generation count into queues. If we want to remove
an inpcb with generation count mismatch, we can't do that, we
can only mark it with desired new time slot or -1 for remove.
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33026
The HPTS input queue is in reality used only for "delayed drops".
When a TCP stack decides to drop a connection on the output path
it can't do that due to locking protocol between main tcp_output()
and stacks. So, rack/bbr utilize HPTS to drop the connection in
a different context.
In the past the queue could also process input packets in context
of HPTS thread, but now no stack uses this, so remove this
functionality.
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33025
Also, make some of the functions also private to the module. Remove
unused functions discovered after that.
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33024
With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc). However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.
Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree(). However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it. Details:
- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
Once the desired inpcb is found we need to transition from SMR
section to r/w lock on the inpcb itself, with a check that inpcb
isn't yet freed. This requires some compexity, since SMR section
itself is a critical(9) section. The complexity is hidden from
KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
notification) also a new KPI is provided, that hides internals of
the database - inp_next(struct inp_iterator *).
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33022
With upcoming changes to the inpcb synchronisation it is going to be
broken. Even its current status after the move of PCB synchronization
to the network epoch is very questionable.
This experimental feature was sponsored by Juniper but ended never to
be used in Juniper and doesn't exist in their source tree [sjg@, stevek@,
jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix
[gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@].
I'm up to resurrecting it back if there is any interest from anybody.
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33020
I just discovered that the return of the EBUSY error was incorrectly
rigged so that you could unload a CC module that was set to default.
Its supposed to be an EBUSY error. Make it so.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33229
Add driver for TMP461 thermal sensor. Register new sysctl node
of integer type for device. Read register and fill sysctl with
valid temperature.
Reviewed by:
Sponsored by: Alstom
Obtained from: Semihalf
Differential revision: https://reviews.freebsd.org/D32818
Add helper functions for 32 and 64 bit unsigned to signed integers
conversions.
Reviewed by:
Sponsored by: Alstom
Obtained from: Semihalf
Differential revision: https://reviews.freebsd.org/D33162
When an ACL is presented to the NFSv4 server in
Setattr or Verify, parsing of the ACL assumed a
sane acecnt and sane sizes for the "who" strings.
This patch adds sanity checks for these.
The patch also fixes handling of an error
return from nfsrv_dissectacl() for one broken
case.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260111
MFC after: 2 weeks
When nfsrv_checksequence() replies NFSERR_BADSLOT,
the value of nd_slotid is not valid. As such, the
reply cannot be cached in the session.
Do not set ND_HASSEQUENCE for this case.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260076
MFC after: 2 weeks
Similarly to the other Intel drivers, don't try to process
RX checksum offloads when this feature (IFCAP_RXCSUM) is
disabled.
Reviewed by: gallatin, kbowling, erj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33155
The ifp (struct ifnet) backpointer in the e1000 private ifnet
data is not used anymore since the iflib transition.
Remove it so that developers are not tempted to use it and
get a NULL pointer dereference.
Reviewed by: markj, kbowling, erj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33157
While it still looks like previous code worked by coincidence, this
change broke things even more instead of fixing.
Reported by: avg@
MFC after: 1 week
For socket options related to local and remote addresses providing
generic association ids does not make sense. Report EINVAL in this
case.
MFC after: 1 week
The vast majority of the busy/unbusy users in the tree don't acquire
Giant before calling device_busy/unbusy. However, if multiple threads
are opening a file, say, that causes the device to busy/unbusy, then we
can race to the root marking things busy. Move to using a reference
count to keep track of how many times a device_t has been made busy. Use
that count to make the same decisions that we'd make with the old device
state.
Note: gpiopps.c uses D_TRACKCLOSE. Others do as well. However, there's a
known race with closes that will be corrected for all the drivers that
do this in a future commit.
Sponsored by: Netflix
Reviewed by: hselasky, jhb
Differential Revision: https://reviews.freebsd.org/D26284
This reverts commit 08e7819153.
Commit message was for a very old version of the patch. Will re-commit
with the right one since it's so bad. There's no locked versions of
it...that code was reworked to use refcnt APIs.
Noticed by: jhb, jtrc27
Sponsored by: Netflix
The vast majority of the busy/unbusy users in the tree don't acquire Giant
before calling device_busy/unbusy. However, if multiple threads are opening a
file, say, that causes the device to busy/unbusy, then we can race to the root
marking things busy. Create a new device_busy_locked and device_unbusy_locked
that are the current implemntations of device_busy and device_unbusy. Make
device_busy and unbusy acquire Giant before calling the _locked versrions. Since
we never sleep in the busy/unbusy path, Giant's single threaded semantics
suffice to keep this safe.
Sponsored by: Netflix
Reviewed by: hselasky, jhb
Differential Revision: https://reviews.freebsd.org/D26284
As part of converting the code to a while loop, the unconditional
initialization of wired to false was lost.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33163
in sampling mode to workaround firmware bug.
This fixes reboot or poweroff on frame.work laptops after first touch.
Reported by: many
PR: 259230
MFC after: 1 week
Tested by: kevans, markj
This was found while looking for driver_filter_t functions which got the
trap frame from the argument. This particular instance it isn't even
used, so remove now lest someone else get to it first.
Reviewed by: mhorne
In in_stf_input() we grabbed a pointer to the IPv4 header and later did
an m_pullup() before we look at the IPv6 header. However, m_pullup()
could rearrange the mbuf chain and potentially invalidate the pointer to
the IPv4 header.
Avoid this issue by copying the IP header rather than getting a pointer
to it.
Reported by: markj, Jenkins (KASAN job)
Reviewed by: markj
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33192
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.
Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.
PR: 259157
Reviewed by: mav, kib, markj (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32580
We do not consider the space reserved for the pcb to be part of the
total kstack size, so it should not be included in the calculation of
the used stack size.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
On this platform, the pcb and FPU save area are allocated from the top
of each kernel stack, so they should be excluded from the calculation of
the total and used stack sizes.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32581
In case we are only embedding a single firmware image the variable
"parent" gets set but never used. Add checks for the number of files
for it and only print it out if we are exceeding the single file count.
This fixes -Wunused-but-set-variable warnings for the majority of
firmware files in the tree.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
There are two places where we convert from a timecounter delta to
a bintime delta: tc_windup and bintime_off.
Both functions use the same calculations when the timecounter delta is
small. But for a large delta (greater than approximately an equivalent
of 1 second) the calculations were different. Both functions use
approximate calculations based on th_scale that avoid division. Both
produce values slightly greater than a true value, calculated with
division by tc_frequency, would be. tc_windup is slightly more
accurate, so its result is closer to the true value and, thus, smaller
than bintime_off result.
As a consequence there can be a jump back in time when time hands are
switched after a long period of time (a large delta). Just before the
switch the time would be calculated with a large delta from
th_offset_count in bintime_off. tc_windup does the switch using its own
calculations of a new th_offset using the large delta. As explained
earlier, the new th_offset may end up being less than the previously
produced binuptime. So, for a period of time new binuptime values may
be "back in time" comparing to values just before the switch.
Such a jump must never happen. All the code assumes that the uptime is
monotonically nondecreasing and some code works incorrectly when that
assumption is broken. For example, we have observed sleepq_timeout()
ignoring a timeout when the sbinuptime value obtained by the callout
code was greater than the expiration value, but the sbinuptime obtained
in sleepq_timeout() was less than it. In that case the target thread
would never get woken up.
The unified calculations should ensure the monotonic property of the
uptime.
The problem is quite rare as normally tc_windup should be called HZ
times per second (typically 1000 or 100). But it may happen in VMs on
very busy hypervisors where a VM's virtual CPU may not get an execution
time slot for a second or more.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Panzura LLC