Commit graph

140317 commits

Author SHA1 Message Date
Gleb Smirnoff
ae04d30451 ng_l2tp: use callout_reset() instead of ng_callout()
The previous commit to this node falsely stated that locked callouts
are compatible with netgraph ng_callout KPI.  They are not.  An item
can be queued instead of being applied to the node, which results in
a mutex leak to the callout thread and later unlocked call into function
that expects to be called locked.

Potentially netgraph can be taught to handle locked callouts, but that
would bring a lot of complexity in it.  Instead lets question necessity
of ng_callout() instead of callout_reset().  It protects against node
going away while callout is scheduled.  But a node that drains all
callouts in the shutdown method (ng_l2tp does) is already protected.

Fixes:	89042ff776
2021-12-03 08:57:23 -08:00
Arnaud Ysmal
ea68079ffd Suport Q-in-Q for mvneta. 2021-12-03 11:06:58 +01:00
Konstantin Belousov
a5c2d59ed3 Expand comment explaining reasons for automatic swapoff on shutdown
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33167
2021-12-03 10:42:21 +02:00
Gleb Smirnoff
12ae3476f3 tcp_drain(): initialize the inpcb iterator when curvnet is set
Reported by:	cy
Pointy hat to:	glebius
Fixes:		de2d47842e
2021-12-02 21:08:30 -08:00
Gleb Smirnoff
651a545143 udp_detach(): fix set but not used warning 2021-12-02 20:12:40 -08:00
Gleb Smirnoff
bd1d085045 udp_multi_input(): the UDP header is only needed for probes
Reported by:	kib
Fixes:		de2d47842e
2021-12-02 20:12:40 -08:00
Gleb Smirnoff
4b4cce02ac xhci: add PCI IDs for USB controllers found on Supermicro M12SWA-TF 2021-12-02 20:12:33 -08:00
Alexander Motin
2dfc1f7355 APEI: Improve multiple error sources handling.
Some AMD systems I have report 8 NMI and 3591 polled error sources.
Previous code could handle only one NMI source and used separate
callout for each polled source.  New code can handle multiple NMIs
and groups polled sources by power of 2 of the polling period.

MFC after:	2 weeks
2021-12-02 18:06:12 -05:00
Cy Schubert
db0ac6ded6 Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
2021-12-02 14:45:04 -08:00
Cy Schubert
266f97b5e9 wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.

MFC after:      1 month
2021-12-02 13:35:14 -08:00
Warner Losh
a10253cffe mps(4): Fix unmatched devq release.
Port 9781c28c6d and a8837c77ef to the mps driver.  Before this
change devq was frozen only if some command was sent to the target after
reset started, but release was called always.  This change freezes the
devq immediately, leaving mprsas_action_scsiio() check only to cover
race condition due to different lock devq use.

This should also avoid unnecessary requeue of the commands, creating
additional log noise and confusing some broken apps. It also avoids a
'busy' requeue of I/Os failing when we're doing recovery that takes
longer than the normal busy timeout. These I/Os failing can lead to
filesystems being unmounted in the force unmount case for I/O errors.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D33228
2021-12-02 13:53:44 -07:00
Gleb Smirnoff
3cce6164ab ip_input: remove pointless check in INP_RECVIF handling
An mbuf rcvif pointer is supposed to be valid and doesn't
need extra checks.  The code appeared in d314ad7b73.
2021-12-02 11:15:04 -08:00
Gleb Smirnoff
d96fccc505 epoch: with EPOCH_TRACE add epoch_where_report()
which will report where the epoch was entered and also
mark the tracker, so that exit will also be reported.

Helps to understand epoch entrance/exit scenarios in
complex cases, like network stack.  As everything else
under EPOCH_TRACE it is a developer only tool.
2021-12-02 11:02:51 -08:00
Gleb Smirnoff
9e93d2b335 ifnet: enable & fix if_debug build
Fixes:	ce40632a31
2021-12-02 10:59:43 -08:00
Gleb Smirnoff
2e27230ff9 tcp_hpts: rewrite inpcb synchronization
Just trust the pcb database, that if we did in_pcbref(), no way
an inpcb can go away.  And if we never put a dropped inpcb on
our queue, and tcp_discardcb() always removes an inpcb to be
dropped from the queue, then any inpcb on the queue is valid.

Now, to solve LOR between inpcb lock and HPTS queue lock do the
following trick.  When we are about to process a certain time
slot, take the full queue of the head list into on stack list,
drop the HPTS lock and work on our queue.  This of course opens
a race when an inpcb is being removed from the on stack queue,
which was already mentioned in comments.  To address this race
introduce generation count into queues.  If we want to remove
an inpcb with generation count mismatch, we can't do that, we
can only mark it with desired new time slot or -1 for remove.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33026
2021-12-02 10:48:49 -08:00
Gleb Smirnoff
f971e79139 tcp_hpts: rename input queue to drop queue and trim dead code
The HPTS input queue is in reality used only for "delayed drops".
When a TCP stack decides to drop a connection on the output path
it can't do that due to locking protocol between main tcp_output()
and stacks.  So, rack/bbr utilize HPTS to drop the connection in
a different context.

In the past the queue could also process input packets in context
of HPTS thread, but now no stack uses this, so remove this
functionality.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33025
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
b0a7c008cb tcp_hpts: make struct tcp_hpts_entry private to the module.
Also, make some of the functions also private to the module. Remove
unused functions discovered after that.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33024
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
50f081ecb7 tcp_hpts: provide tcp_in_hpts().
It will hide some internal HPTS knowledge from the consumers.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33023
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
de2d47842e SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc).  However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.

Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree().  However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it.  Details:

- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
  Once the desired inpcb is found we need to transition from SMR
  section to r/w lock on the inpcb itself, with a check that inpcb
  isn't yet freed.  This requires some compexity, since SMR section
  itself is a critical(9) section.  The complexity is hidden from
  KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
  notification) also a new KPI is provided, that hides internals of
  the database - inp_next(struct inp_iterator *).

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33022
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
565655f4e3 inpcb: reduce some aliased functions after removal of PCBGROUP.
Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33021
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
93c67567e0 Remove "options PCBGROUP"
With upcoming changes to the inpcb synchronisation it is going to be
broken. Even its current status after the move of PCB synchronization
to the network epoch is very questionable.

This experimental feature was sponsored by Juniper but ended never to
be used in Juniper and doesn't exist in their source tree [sjg@, stevek@,
jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix
[gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@].

I'm up to resurrecting it back if there is any interest from anybody.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33020
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
1cec1c5831 Allow to compile RSS without PCBGROUP.
Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33019
2021-12-02 10:48:48 -08:00
Don Morris
8f82dc8dd3 hyperv: Flag hn and storvsc statistics with CTLFLAG_STATS.
Reviewed by:    vangyzen, whu, bdrewery
Sponsored by:	Dell EMC
Differential Revision: https://reviews.freebsd.org/D30060
2021-12-02 10:46:36 -08:00
Gordon Bergling
25d0ccbe10 bce(4): Fix a typo in a sysctl description
- s/duirng/during/

MFC after:	3 days
2021-12-02 16:12:34 +01:00
Randall Stewart
dcf2dfed26 tcp: unloading a module that is set to default should error.
I just discovered that the return of the EBUSY error was incorrectly
rigged so that you could unload a CC module that was set to default.
Its supposed to be an EBUSY error. Make it so.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33229
2021-12-02 06:12:16 -05:00
Gordon Bergling
fe96f62d61 kern: Correct a typo in a sysctl description
- s/osbolete/obsolete/

MFC after:	3 days
2021-12-02 10:54:15 +01:00
Gordon Bergling
c937fb286f mlx5: Correct a typo in a sysctl description
- s/parameteres/parameters/

MFC after:	3 days
2021-12-02 10:51:18 +01:00
Hubert Mazur
f89f6f9581 TMP461: Add thermal sensor driver
Add driver for TMP461 thermal sensor. Register new sysctl node
of integer type for device. Read register and fill sysctl with
valid temperature.

Reviewed by:
Sponsored by:		Alstom
Obtained from:		Semihalf
Differential revision:	https://reviews.freebsd.org/D32818
2021-12-02 09:18:48 +01:00
Hubert Mazur
efa5d3831b sys/libkern.h: Add type conversion helpers
Add helper functions for 32 and 64 bit unsigned to signed integers
conversions.

Reviewed by:
Sponsored by:		Alstom
Obtained from:		Semihalf
Differential revision:	https://reviews.freebsd.org/D33162
2021-12-02 09:18:48 +01:00
Rick Macklem
fd020f197d nfsd: Sanity check the ACL attribute
When an ACL is presented to the NFSv4 server in
Setattr or Verify, parsing of the ACL assumed a
sane acecnt and sane sizes for the "who" strings.
This patch adds sanity checks for these.

The patch also fixes handling of an error
return from nfsrv_dissectacl() for one broken
case.

Reported by:	rtm@lcs.mit.edu
Tested by:	rtm@lcs.mit.edu
PR:	260111
MFC after:	2 weeks
2021-12-01 13:55:17 -08:00
Rick Macklem
33d0be8a92 nfsd: Do not try to cache a reply for NFSERR_BADSLOT
When nfsrv_checksequence() replies NFSERR_BADSLOT,
the value of nd_slotid is not valid.  As such, the
reply cannot be cached in the session.
Do not set ND_HASSEQUENCE for this case.

Reported by:	rtm@lcs.mit.edu
Tested by:	rtm@lcs.mit.edu
PR:	260076
MFC after:	2 weeks
2021-12-01 13:46:41 -08:00
Vincenzo Maffione
d0633af765 em: skip rxcsum offload processing when disabled
Similarly to the other Intel drivers, don't try to process
RX checksum offloads when this feature (IFCAP_RXCSUM) is
disabled.

Reviewed by:	gallatin, kbowling, erj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33155
2021-12-01 21:10:46 +00:00
Vincenzo Maffione
d91559564d e1000: remove unused ifp backpointer
The ifp (struct ifnet) backpointer in the e1000 private ifnet
data is not used anymore since the iflib transition.
Remove it so that developers are not tempted to use it and
get a NULL pointer dereference.

Reviewed by:	markj, kbowling, erj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33157
2021-12-01 21:08:31 +00:00
Alexander Motin
94a72c5ac4 amdtemp: Revert related part of "Make CPU children" commit.
While it still looks like previous code worked by coincidence, this
change broke things even more instead of fixing.

Reported by:	avg@
MFC after:	1 week
2021-12-01 13:04:31 -05:00
Zhenlei Huang
73d41cc730 if_epair: Also mark the flag of pair b with IFF_KNOWSEPOCH
Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33210
2021-12-01 15:54:23 +01:00
Michael Tuexen
13c196a41e sctp: improve handling of assoc ids in socket options
For socket options related to local and remote addresses providing
generic association ids does not make sense. Report EINVAL in this
case.

MFC after:	1 week
2021-12-01 14:54:55 +01:00
Michael Tuexen
a01b8859cb sctp: cleanup, no functional change intended.
MFC after:	1 week
2021-12-01 09:19:40 +01:00
Warner Losh
1c7d15b030 Make device_busy/unbusy work w/o Giant held
The vast majority of the busy/unbusy users in the tree don't acquire
Giant before calling device_busy/unbusy. However, if multiple threads
are opening a file, say, that causes the device to busy/unbusy, then we
can race to the root marking things busy. Move to using a reference
count to keep track of how many times a device_t has been made busy. Use
that count to make the same decisions that we'd make with the old device
state.

Note: gpiopps.c uses D_TRACKCLOSE. Others do as well. However, there's a
known race with closes that will be corrected for all the drivers that
do this in a future commit.

Sponsored by:		Netflix
Reviewed by:		hselasky, jhb
Differential Revision:	https://reviews.freebsd.org/D26284
2021-11-30 15:18:01 -07:00
Warner Losh
25c49c426c Revert "Make device_busy/unbusy work w/o Giant held"
This reverts commit 08e7819153.

Commit message was for a very old version of the patch. Will re-commit
with the right one since it's so bad. There's no locked versions of
it...that code was reworked to use refcnt APIs.

Noticed by:	jhb, jtrc27
Sponsored by:	Netflix
2021-11-30 15:17:07 -07:00
Warner Losh
08e7819153 Make device_busy/unbusy work w/o Giant held
The vast majority of the busy/unbusy users in the tree don't acquire Giant
before calling device_busy/unbusy. However, if multiple threads are opening a
file, say, that causes the device to busy/unbusy, then we can race to the root
marking things busy. Create a new device_busy_locked and device_unbusy_locked
that are the current implemntations of device_busy and device_unbusy. Make
device_busy and unbusy acquire Giant before calling the _locked versrions. Since
we never sleep in the busy/unbusy path, Giant's single threaded semantics
suffice to keep this safe.

Sponsored by:		Netflix
Reviewed by:		hselasky, jhb
Differential Revision:	https://reviews.freebsd.org/D26284
2021-11-30 15:03:26 -07:00
Warner Losh
3846662dab cam: Initialize wired to false
As part of converting the code to a while loop, the unconditional
initialization of wired to false was lost.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D33163
2021-11-30 14:40:30 -07:00
Vladimir Kondratyev
c508b0818b iichid(4): Perform acknowledgement of I2C device interrupt after RESET command
in sampling mode to workaround firmware bug.

This fixes reboot or poweroff on frame.work laptops after first touch.

Reported by:	many
PR:		259230
MFC after:	1 week
Tested by:	kevans, markj
2021-12-01 00:29:50 +03:00
Elliott Mitchell
d893d9e94d xen/dev: remove write-only variable
This was found while looking for driver_filter_t functions which got the
trap frame from the argument.  This particular instance it isn't even
used, so remove now lest someone else get to it first.

Reviewed by:	mhorne
2021-11-30 17:11:57 -04:00
Kristof Provost
439da7f06d if_stf: KASAN fix
In in_stf_input() we grabbed a pointer to the IPv4 header and later did
an m_pullup() before we look at the IPv6 header. However, m_pullup()
could rearrange the mbuf chain and potentially invalidate the pointer to
the IPv4 header.

Avoid this issue by copying the IP header rather than getting a pointer
to it.

Reported by:	markj, Jenkins (KASAN job)
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33192
2021-11-30 17:35:15 +01:00
Mitchell Horne
0d2224733e Implement GET_STACK_USAGE on remaining archs
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.

Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.

PR:		259157
Reviewed by:	mav, kib, markj (previous version)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32580
2021-11-30 11:15:56 -04:00
Mitchell Horne
b02908b051 arm64, powerpc: fix calculation of 'used' in GET_STACK_USAGE
We do not consider the space reserved for the pcb to be part of the
total kstack size, so it should not be included in the calculation of
the used stack size.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-11-30 11:15:44 -04:00
Mitchell Horne
8bc792b384 i386: take pcb and fpu area into account in GET_STACK_USAGE
On this platform, the pcb and FPU save area are allocated from the top
of each kernel stack, so they should be excluded from the calculation of
the total and used stack sizes.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32581
2021-11-30 11:03:46 -04:00
Bjoern A. Zeeb
b394e16ef0 fw_stub: fix -Wunused-but-set-variable for firmware files
In case we are only embedding a single firmware image the variable
"parent" gets set but never used.  Add checks for the number of files
for it and only print it out if we are exceeding the single file count.
This fixes -Wunused-but-set-variable warnings for the majority of
firmware files in the tree.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2021-11-30 14:23:18 +00:00
Andriy Gapon
3d9d64aa18 kern_tc: unify timecounter to bintime delta conversion
There are two places where we convert from a timecounter delta to
a bintime delta: tc_windup and bintime_off.
Both functions use the same calculations when the timecounter delta is
small.  But for a large delta (greater than approximately an equivalent
of 1 second) the calculations were different.  Both functions use
approximate calculations based on th_scale that avoid division.  Both
produce values slightly greater than a true value, calculated with
division by tc_frequency, would be.  tc_windup is slightly more
accurate, so its result is closer to the true value and, thus, smaller
than bintime_off result.

As a consequence there can be a jump back in time when time hands are
switched after a long period of time (a large delta).  Just before the
switch the time would be calculated with a large delta from
th_offset_count in bintime_off.  tc_windup does the switch using its own
calculations of a new th_offset using the large delta.  As explained
earlier, the new th_offset may end up being less than the previously
produced binuptime.  So, for a period of time new binuptime values may
be "back in time" comparing to values just before the switch.

Such a jump must never happen.  All the code assumes that the uptime is
monotonically nondecreasing and some code works incorrectly when that
assumption is broken.  For example, we have observed sleepq_timeout()
ignoring a timeout when the sbinuptime value obtained by the callout
code was greater than the expiration value, but the sbinuptime obtained
in sleepq_timeout() was less than it.  In that case the target thread
would never get woken up.

The unified calculations should ensure the monotonic property of the
uptime.

The problem is quite rare as normally tc_windup should be called HZ
times per second (typically 1000 or 100).  But it may happen in VMs on
very busy hypervisors where a VM's virtual CPU may not get an execution
time slot for a second or more.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Panzura LLC
2021-11-30 15:23:23 +02:00
Gordon Bergling
1b0602f2db mips: Fix a typo in a source code comment
- s/segement/segment/

MFC after:	3 days
2021-11-30 10:41:46 +01:00