Use get_pcpu() instead of an open-coded pcpu_find(td->td_oncpu). This
eliminates some memory accesses and results in a shorter instruction
sequence. Note that get_pcpu() didn't exist when rmlocks were added.
Reviewed by: jah, mjg
Sponsored by: The FreeBSD Foundation
(cherry picked from commit c84bb8cd77)
ler@, markj@ reported a use after free in nfscl_cleanupkext().
They also provided two possible causes:
- In nfscl_cleanup_common(), "own" is the owner string
owp->nfsow_owner. If we free that particular
owner structure, than in subsequent comparisons
"own" will point to freed memory.
- nfscl_cleanup_common() can free more than one owner, so the use
of LIST_FOREACH_SAFE() in nfscl_cleanupkext() is not sufficient.
I also believe there is a 3rd:
- If nfscl_freeopenowner() or nfscl_freelockowner() is called
without the NFSCLSTATE mutex held, this could race with
nfscl_cleanupkext().
This could happen when the exclusive lock is held
on the client, such as when delegations are being returned
or when recovering from NFSERR_EXPIRED.
This patch fixes them as follows:
1 - Copy the owner string to a local variable before the
nfscl_cleanup_common() call.
2 - Modify nfscl_cleanup_common() so that it will never free more
than the first matching element. Normally there should only
be one element in each list with a matching open/lock owner
anyhow (but there might be a bug that results in a duplicate).
This should guarantee that the FOREACH_SAFE loops in
nfscl_cleanupkext() are adequate.
3 - Acquire the NFSCLSTATE mutex in nfscl_freeopenowner()
and nfscl_freelockowner(), if it is not already held.
This serializes all of these calls with the ones done in
nfscl_cleanup_common().
(cherry picked from commit 1cedb4ea1a)
Some board use dwc phy in MII mode, so do not fail to attach if this is
the case.
Only rockchip code uses the phy mode to program some custom syscon register.
PR: 260848
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
(cherry picked from commit da6252a6a0)
Currently there are five quirks the USB stack tries to automagically detect:
- UQ_MSC_NO_PREVENT_ALLOW
- UQ_MSC_NO_SYNC_CACHE
- UQ_MSC_NO_TEST_UNIT_READY
- UQ_MSC_NO_GETMAXLUN
- UQ_MSC_NO_START_STOP
If any of the quirks above are set, no further quirks will be probed.
If any of the USB mass storage tests fail, the USB device is
re-enumerated as a last resort to clear any error states from the
device. Then the USB stack will try to probe and attach the umass<N>
device passing the detected quirks.
While at it give more details in dmesg about what is going on.
Tested by: several
Submitted by: Idwer Vollering <vidwer_fbsdbugs@gmail.com>
Differential Revision: https://reviews.freebsd.org/D30919
Sponsored by: NVIDIA Networking
(cherry picked from commit 7520b88860)
The IBTA specification has new speed - NDR. That speed supports signaling
rate of 100Gb. mlx5 IB driver translates link modes reported by ConnectX
device to IB speed and width. Added translation of new 100Gb, 200Gb and
400Gb link modes to NDR IB type and width of x1, x2 or x4 respectively.
Linux commits:
f946e45f59ef01ff54ffb3b1eba3a8e7915e7326
Sponsored by: NVIDIA Networking
(cherry picked from commit 91c8ffd7e6)
Due to misplaced braces, an error from vfs_uninit() in the VFCF_SBDRY
case was ignored.
Reported by: Anton Rang <rang@acm.org>
Reviewed by: Anton Rang <rang@acm.org>, markj
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D34375
(cherry picked from commit 1517b8d5a7)
The ACPI spec describes the FADT->Century field as:
The RTC CMOS RAM index to the century of data value (hundred and
thousand year decimals). If this field contains a zero, then the
RTC centenary feature is not supported. If this field has a non-zero
value, then this field contains an index into RTC RAM space that
OSPM can use to program the centenary field.
Use this field to decide whether to program the CENTURY register
of the CMOS RTC device.
Reviewed by: akumar3@isilon.com, dab, vangyzen
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D33667
MFC after: 1 week
Sponsored by: Dell EMC Isilon
(cherry picked from commit e1ef6c0ef2)
The original implementation only supports getting the address from legacy
BIOS (by searching for the SMBIOS_SIG pattern in a fixed address space).
Try to get the SMBIOS table from EFI through efirt (EFI Runtime Services)
firstly. Continue to search in the legacy BIOS if a NULL address is
returned from EFI.
By this way the ipmi function supports both legacy BIOS and UEFI systems.
Reviewed by: dab, vangyzen
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D30007
(cherry picked from commit ee8b757a94)
This function will be used for exposing DMI info as sysctls in the
smbios module (in an upcoming review).
While here, add __packed to the structs.
Reviewed by: dab
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29270
(cherry picked from commit f689cb23b2)
On some systems (e.g. Lenovo ThinkPad X240, Apple MacBookPro12,1)
the SMBIOS entry point is not found in the <0xFFFFF space.
Follow the SMBIOS spec and use the EFI Configuration Table for
locating the entry point on EFI systems.
Reviewed by: rpokala, dab
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29276
(cherry picked from commit a29bff7a52)
Add it to the x86 GENERIC and MINIMAL kernels
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: rpokala
Differential Revision: https://reviews.freebsd.org/D28738
(cherry picked from commit d0673fe160)
MFC 3d721de049 ("Fix NFS exports of FUSE file systems for big
directories") missed a case of a uint64_t from HEAD that should be a
u_long in 13 due to KPI differences. Specifically, HEAD has b214fcceac
("Change VOP_READDIR's cookies argument to a **uint64_t"), but stable/13
does not.
This is a direct commit to stable/13.
The FUSE protocol does not require that a directory entry's d_off field
outlive the lifetime of its directory's file handle. Since the NFS
server must reopen the directory on every VOP_READDIR call, that means
it can't pass uio->uio_offset down to the FUSE server. Instead, it must
read the directory from 0 each time. It may need to issue multiple
FUSE_READDIR operations until it finds the d_off field that it's looking
for. That was the intention behind SVN r348209 and r297887, but a logic
bug prevented subsequent FUSE_READDIR operations from ever being issued,
rendering large directories incompletely browseable.
Reviewed by: rmacklem
(cherry picked from commit d088dc76e1)
fusefs: optimize NFS readdir for FUSE_NO_OPENDIR_SUPPORT
In its lowest common denominator, FUSE does not require that a directory
entry's d_off field is valid outside of the lifetime of the directory's
FUSE file handle. But since NFS is stateless, it must reopen the
directory on every call to VOP_READDIR. That means reading the
directory all the way from the first entry. Not only does this create
an O(n^2) condition for large directories, but it can also result in
incorrect behavior if either:
* The file system _does_ change the d_off field for the last directory
entry previously seen by NFS, or
* The file system deletes the last directory entry previously seen by
NFS.
Handily, for file systems that set FUSE_NO_OPENDIR_SUPPORT d_off is
guaranteed to be valid for the lifetime of the directory entry, there is
no need to read the directory from the start.
Reviewed by: rmacklem
(cherry picked from commit 4a6526d84a)
fusefs: require FUSE_NO_OPENDIR_SUPPORT for NFS exporting
FUSE file systems that do not set FUSE_NO_OPENDIR_SUPPORT do not
guarantee that d_off will be valid after closing and reopening a
directory. That conflicts with NFS's statelessness, that results in
unresolvable bugs when NFS reads large directories, if:
* The file system _does_ change the d_off field for the last directory
entry previously returned by VOP_READDIR, or
* The file system deletes the last directory entry previously seen by
NFS.
Rather than doing a poor job of exporting such file systems, it's better
just to refuse.
Even though this is technically a breaking change, 13.0-RELEASE's
NFS-FUSE support was bad enough that an MFC should be allowed.
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33726
(cherry picked from commit 00134a0789)
fusefs: fix the build without INVARIANTS after 00134a0789
MFC with: 00134a0789
Reported by: se
(cherry picked from commit 18ed2ce77a)
After b5d227b0 FreeBSD was panicking on boot with "Duplicate free" in
UMA. Analyzing the asm, the '1' mask was treated as an integer, rather
than a long, causing 'slw' (shift left word) to be used for the shifting
instruction, not 'sld' (shift left double). This means the upper bits
of the bitfield were not getting used, resulting in corruption of the
bitfield.
While fixing this, the 'and' check of the mask does not need to be
recorded, so don't record (drop the '.').
(cherry picked from commit aa4736459e)
Add machine-optimized implementations for the following:
* atomic_testandset_int
* atomic_testandclear_int
* atomic_testandset_long
* atomic_testandclear_long
This fixes the build with ISA_206_ATOMICS enabled.
Add the associated atomic_testandset_32, atomic_testandclear_32, so
that ice(4) can potentially build.
(cherry picked from commit b5d227b0b2)
* New error_flags that can be used from the error ithread and elsewhere
without a synch_op.
* Stop the adapter immediately in t4_fatal_err but defer most of the
rest of the handling to a task. The task is allowed to sleep, unlike
the ithread. Remove async_event_task as it is no longer needed.
* Dump the devlog, CIMLA, and PCIE_FW exactly once on any fatal error
involving the firmware or the CIM block. While here, dump some
additional info (see dump_cim_regs) for these errors.
* If both reset_on_fatal_err and panic_on_fatal_err are set then attempt
a reset first and do not panic the system if it is successful.
Sponsored by: Chelsio Communications
(cherry picked from commit e9e7bc8250)
The spinning start time is missing from the calculation due to a
misplaced #endif. Return the #endif where it's supposed to be.
Submitted by: Alexander Alexeev <aalexeev@isilon.com>
Reviewed by: bdrewery, mjg
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D31384
(cherry picked from commit 428624130a)
When sending an IPI, if a previous IPI is still pending delivery,
native_lapic_ipi_vectored() waits for the previous IPI to be sent.
We've seen a few inexplicable panics with the current timeout of 50 ms.
Increase the timeout to 1 second and make it tunable.
No hardware specification mentions a timeout in this case; I checked
the Intel SDM, Intel MP spec, and Intel x2APIC spec. Linux and illumos
wait forever. In Linux, see __default_send_IPI_shortcut() in
arch/x86/kernel/apic/ipi.c. In illumos, see apic_send_ipi() in
usr/src/uts/i86pc/io/pcplusmp/apic_common.c. However, misbehaving hardware
could hang the system if we wait forever.
Reviewed by: mav kib
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29942
(cherry picked from commit 2f32a971b7)
Commit c862d5f2a7 ("riscv: Fix a race in pmap_pinit()") did not really
fix the race. Alan writes,
Suppose that N entries in the L1 tables are in use, and we are in the
middle of the memcpy(). Specifically, we have read the zero-filled
(N+1)st entry from the kernel L1 table. Then, we are preempted. Now,
another core/thread does pmap_growkernel(), which fills the (N+1)st
entry. Finally, we return to the original core/thread, and overwrite
the valid entry with the zero that we earlier read.
Try to fix the race properly, by copying kernel L1 entries while holding
the allpmaps lock. To avoid doing unnecessary work while holding this
global lock, copy only the entries that we expect to be valid.
Fixes: c862d5f2a7 ("riscv: Fix a race in pmap_pinit()")
Reported by: alc, jrtc27
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
(cherry picked from commit d5c0a7b6d3)
All pmaps share the top half of the address space. With 3-level page
tables, the top-level kernel map entries are not static: they might
change if the kernel map is extended (via pmap_growkernel()) or a 1GB
mapping in the direct map is demoted (not implemented yet). Thus the
riscv pmap maintains the allpmaps list to synchronize updates to
top-level entries.
When a pmap is created, it is inserted into this list after copying
top-level entries from the kernel pmap. The copying is done without
holding the allpmaps lock, and it is possible for pmap_pinit() to race
with kernel map updates. In particular, if a thread is modifying L1
entries, and a concurrent pmap_pinit() copies the old version of the
entries, it might not receive the update.
Fix the problem by copying the kernel map entries after inserting the
pmap into the list. This ensures that the nascent pmap always receives
updates, though pmap_distribute_l1() may race with the page copy.
Reviewed by: mhorne, jhb
Sponsored by: The FreeBSD Foundation
(cherry picked from commit c862d5f2a7)
if_bridge duplicates broadcast packets with m_copypacket(), which
creates shared packets. In certain circumstances these packets can be
processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as
part of the checksum verification. That may lead to incorrect packets
being transmitted.
Use m_dup() to create independent mbufs instead.
Reported by: Richard Russo <toast@ruka.org>
Reviewed by: donner, afedorov
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D34319
(cherry picked from commit 36637dd19d)
udp_set_kernel_tunneling() rejects new callbacks if one is already set.
Allow callbacks to be cleared. The use case for this is OpenVPN DCO,
where the socket is opened by userspace and then adopted by the kernel
to run the tunnel. If the DCO interface is removed but userspace does
not close the socket (something the kernel cannot prevent) the installed
callbacks could be called with an invalidated context.
Allow new functions to be set, but only if they're NULL (i.e. allow the
callback functions to be cleared).
Reviewed by: tuexen
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34288
(cherry picked from commit 995cba5a0c)
Previously we'd always print "out of swap space." This can be
misleading, as there are other reasons an OOM kill can be triggered. In
particular, it's entirely possible to trigger an OOM kill on a system
with plenty of free swap space.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 4a864f624a)
cxgbe_refresh_stats takes into account VI_SKIP_STATS but not
VI_INIT_DONE when deciding whether to read the hardware stats. But
before this change VI_SKIP_STATS was set only for VIs with VI_INIT_DONE.
That meant that cxgbe_refresh_stats always accessed the hardware for
uninitialized VIs, and this is a problem if the adapter is suspended or
in the middle of a reset.
Fix this by setting VI_SKIP_STATS on all VIs during suspend. While
here, ignore VI_INIT_DONE in vi_refresh_stats too to be consistent with
cxgbe_refresh_stats.
Sponsored by: Chelsio Communications
(cherry picked from commit 08c7dc7fd4)
The hardware is unavailable when the device is suspended or in the
middle of a reset.
Sponsored by: Chelsio Communications
(cherry picked from commit 39a36707bd)
The default sysctl context setup by newbus for a device is eventually
freed by device_sysctl_fini, which runs after the device driver's detach
routine. sysctl nodes associated with this context must not use any
resources (like driver locks, hardware access, counters, etc.) that are
released by driver detach.
There are a lot of sysctl nodes like this in cxgbe(4) and the fix is to
hang them off a context that is explicitly freed by the driver before it
releases any resource that might be used by a sysctl.
This fixes panics when running "sysctl dev.t6nex dev.cc" in a tight loop
and loading/unloading the driver in parallel.
Reported by: Suhas Lokesha
Sponsored by: Chelsio Communications
(cherry picked from commit a727d9531a)
This ensures that the driver reports an error instead of failing
silently when an invalid media is requested.
Reported by: Suhas Lokesha @ Chelsio
Sponsored by: Chelsio Communications
(cherry picked from commit cdd7fe04cb)
This eliminates error messages like this from the driver when running at
50Gbps with 100G cables:
[3726] cc0: l1cfg failed: 71
[4407] cc0: l1cfg failed: 71
Note that link comes up anyway with or without this change.
Reported by: Suhas Lokesha @ Chelsio
Sponsored by: Chelsio Communications
(cherry picked from commit f3c2987f2f)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CHANGES
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Version : 1.26.6.0
Date : 01/03/2022
================================================================================
Fixes
-----
BASE:
- Fixed one module eeprom read failure.
- Fixed an issue with speed selection when 40G and 25G are advertised and
supported.
- Fixed a random traffic hang when T5 receives invalid ets BW in dcbx
messages from a switch.
- Fixed very long link up time with few switches.
================================================================================
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
(cherry picked from commit 3b76242433)
This fixes a driver panic during stats collection when a port's id does
not match its tx channel. The bug affected only the T580 card running
with a non-default VPD.
Reported by: Suhas Lokesha @ Chelsio
Sponsored by: Chelsio Communications
(cherry picked from commit bbab9ab579)
sge->ctrlq is not always allocated during attach (eg. if firmware
initialization fails) and detach should be able to deal with this.
Sponsored by: Chelsio Communications
(cherry picked from commit b99651c52f)
(Rest is from the README that came with the firmware)
Version : 1.26.4.0
Date : 12/02/2021
Fixes
-----
BASE:
- Fixed error on setting 25G speed on 100G copper with multiple FEC set in
firmware commands.
- Handle link of unknown optics modules by enabling module tx unconditionally.
- Fixed link not coming up for 25G CRS phys. Firmware incorrectly tried to
bring up the link in RS-FEC but as per IEEE spec, it must be BASER FEC.
- Fixed an issue where firmware doesn't automatically retry next FEC if driver
asks to bring up the link using RS-FEC and link doesn't come up.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
(cherry picked from commit 357ba2cf17)
Modify the GPIO pins only on the Base-T cards and even there drive all
of them low instead of putting them in hi-z state. For the rest (this
is the common case), directly power off the PLLs of the high speed
serdes. This is the simplest method that does not involve or conflict
with the firmware but still works with all T4-T6 cards regardless of
what's plugged into the port.
This fixes a problem where the peer wouldn't always see a link down if
it is connected to the device using a -CR4 copper cable.
Sponsored by: Chelsio Communications
(cherry picked from commit a8eacf9329)
Recent firmwares have support for autonomous FEC selection and a "force"
knob to let the driver control this behavior (or not) in a fine grained
manner. This change adds a driver knob so that all the different ways of
configuring the link FEC can be exercised. Note that this controls the
internal driver/firmware interaction for link configuration and is not
meant for general use.
Sponsored by: Chelsio Communications
(cherry picked from commit 448bcd01dc)
Recent firmwares have more leeway in FEC selection and there is a need
to track the FECs requested by the driver separately from the FEC in use
on the link. The existing dev.<port>.<inst>.fec sysctl can read both but
its behavior depends on the link state and it is sometimes hard to find
out what was requested when the link is up.
Split the fec sysctl into two (requested_fec and link_fec) to get access
to both pieces of information regardless of the link state.
Sponsored by: Chelsio Communications
(cherry picked from commit f6a2e1100f)