The netlink newneigh handler has the potential to leak the lock on
llentry objects in the kernel. This patch reconciles several paths
through the newneigh handler that could result in a lock leak.
MFC after: 1 week
Reviewed by: markj, kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42307
Readd bsddialog(1) to bsdconfig(8).
This can be considered an increment not a replacement: `$DIALOG=dialog'
restores dialog(1), no change for Xdialog(1). An exception is if an
error occurs, bsddialog(1) replaces dialog.
When allocating memory we should try to allocate from the NUMA node
closest to the device to reduce cross domain memory traffic. Teach the
arm64 bus_dma code to do this.
While here use mallocarray to guard against an unlikely integer
overflow.
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42187
Configuring a FreeBSD laptop, my fingers kept wanting to type
`bsdconfig network' and I could not figure out why this was not working.
Took me a second to realize that the shortcut was `bsdconfig networking'
for where I wanted to go.
Reviewed by: jhb
Approved by: jhb
Differential Revision: https://reviews.freebsd.org/D42242
share/man/man4/Makefile adds a number of
variables to MAN and MLINKS, which are only set for
certain architectures.
The empty variables wreak havoc when := is used.
Add :M*.[1-9] to MLINKS reference for STAGE_LINKS.mlinks
to avoid invalid results.
Reviewed by: stevek
This enables obtaining lock information threads are actively waiting for
while sampling. Without the change one would only see a bunch of calls
to lock_delay(), where the stacktrace often does not reveal what the
lock might be.
Note this is not the same as lock profiling, which only produces data
for cases which wait for locks.
struct thread already has a td_lockname field, but I did not use it
because it has different semantics -- denotes when the thread is off
cpu. At the same time it could not be converted to hold a lock_object
pointer because non-curthread access would no longer be guaranteed to be
safe -- by the time it reads the pointer the lock might have been taken,
released and the object containing it freed.
Sample usage with dtrace:
rm /tmp/out.kern_stacks ; dtrace -x stackframes=100 -n 'profile-997 { @[curthread->td_wantedlock != NULL ? stringof(curthread->td_wantedlock->lo_name) : stringof("\n"), stack()] = count(); }' -o /tmp/out.kern_stacks
This also facilitates addition of lock information to traces produced by
hwpmc.
Note: spinlocks are not supported at the moment.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Notable upstream pull request merges:
#14378c0e58995e Large sync writes perform worse with slog
#14721797f55ef1 Do not persist user/group/project quota zap objects
when unneeded
#15356380c25f64 FreeBSD: Improve taskq wrapper
#153904fbc52495 Remove lock from dsl_pool_need_dirty_delay()
#1539757b409856 Trust ARC_BUF_SHARED() more
#15402b29e98fa8 Properly pad struct tx_cpu to cache line
#15405ea30b5a9e Set spa_ccw_fail_time=0 when expanding a vdev
#15416b9384b949 FreeBSD: taskq: Remove unused declaration
Obtained from: OpenZFS
OpenZFS commit: 797f55ef12
During testing at a recent IETF NFSv4 Bakeathon, a non-FreeBSD
server was rebooted. After the reboot, the FreeBSD client sent
an Open/Claim_previous with a Getattr after the Open in the same
compound. The Open/Claim_previous was done to recover the Open
and a Delegation for for a file. The Open succeeded, but the
Getattr after the Open failed with NFSERR_DELAY. This resulted
in the FreeBSD client retrying the entire RPC over and over again,
until the server's recovery grace period ended. Since the Open
succeeded, there was no need to retry the entire RPC.
This patch modifies the NFSv4 client side recovery Open/Claim_previous
RPC reply handling to deal with this case. With this patch, the
Getattr reply of NFSERR_DELAY is ignored and the successful Open
reply is processed.
This bug will not normally affect users, since this non-FreeBSD
server is not widely used (it may not even have shipped to any
customers).
MFC after: 1 month
The dead_bpf_if is not subjected to be written. Make it const so that
on destructive writing to it the kernel will panic instead of silent
memory corruption.
No functional change intended.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42189
The following loader tunables do have corresponding sysctl MIBs but
with inconsistent naming. That may be historical reason. Let's prefer
consistent naming for them so that it will be easier to maintain.
1. hw.dmar.timeout -> hw.iommu.dmar.timeout
2. hw.lapic_eoi_suppression -> hw.apic.eoi_suppression
3. hw.lapic_tsc_deadline -> hw.apic.timer_tsc_deadline
4. hw.x2apic_enable -> hw.apic.x2apic_mode
Those tunables are for field debugging, no need to keep old names for
compatibility.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42248
The sysctl knob 'vm.pmap.allow_2m_x_ept' is loader tunable and have
public document entry in security(7) but is fetched from kernel
environment 'hw.allow_2m_x_ept'. That is inconsistent and obscure.
As there is public security advisory FreeBSD-SA-19:25.mcepsc [1],
people may refer to it and use 'hw.allow_2m_x_ept', let's keep old
name for compatibility.
[1] https://www.freebsd.org/security/advisories/FreeBSD-SA-19:25.mcepsc.asc
Reviewed by: kib
Fixes: c08973d09c Workaround for Intel SKL002/SKL012S errata
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42311
For NFSv4.1/4.2, there are two new options for the Open operation.
These two options use the file handle for the file instead of the
file handle for the directory plus a file name. By doing so, the
client code is simplified (it no longer needs the "nfsv4node" structure
attached to the NFS vnode). It also avoids problems caused by another
NFS client (or process running locally in the NFS server) doing a
rename or remove of the file name between the Lookup and Open.
Unfortunately, there was a bug (fixed recently by commit X)
in the NFS server which mis-parsed the Claim_Deleg_Cur_FH
arguments. To allow this patch to work with the broken FreeBSD
NFSv4.1/4.2 server, NFSMNTP_BUGGYFBSDSRV is defined and is set
when a correctly formatted Claim_Deleg_Cur_FH fails with NFSERR_EXPIRED.
(This is what the old, broken NFS server does, since it erroneously
uses the Getattr arguments as a stateID.) Once this flag is set,
the client fills in a stateID, to make the broken NFS server happy.
Tested at a recent IETF NFSv4 Bakeathon.
MFC after: 1 month
Support early printf for the ns8250 uart driver. Adding
options UART_NS8250_EARLY_PORT=0xYYY
options EARLY_PRINTF
to your kernel config will enable it. The code is rather simple minded,
so caveat emptor. This will enable printf before cninit. cninit
automatically disables this and switches to the real routine. It only
works for port-mapped COM ports, and only if you know the port's address
at compile time. It's intended for be a debugging aide, not a general
purpose thing.
Sponsored by: Netflix
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D42306
Unlike MSI-X, when a device uses multiple MSI interrupts, the entire
group of interrupts are enabled/disabled at once in the relevant PCI
config register. Currently, the interrupt code enables the IDT vector
for each MSI interrupt when a handler is first registered. If the PCI
device triggers an MSI interrupt which doesn't yet have a handler,
this can trigger a panic when the Xrsvd ISR executes rather than
treating it as a stray device interrupt.
To fix, enable all the IDT vectors for an MSI group when the first
interrupt handler is configured, and don't disable the IDT vectors
until the last interrupt handler for the group is torn down.
When migrating an MSI group between CPUs, enable/disable the entire
group of IDT vectors if at least one interrupt handler is configured
for the group.
Reported by: jhay
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D42232
In the zfs_id_over*quota functions, there is a short-circuit to skip
the zap_lookup when the quota zap does not exist. If quotas are never
used in a zpool, then the quota zap will never exist. But if
user/group/project quotas are ever used, the zap objects will be
created and will persist even if the quotas are deleted.
The quota zap_lookup in the write path can become a bottleneck for
write-heavy small I/O workloads. Before this commit, it was not
possible to remove this lookup without creating a new zpool.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam Atkinson <samatk@amazon.com>
Closes#14721
When invoking certctl as part of installworld, set LOCALBASE in the
environment to match the build environment. That ensures that LOCABASE
is non-empty on systems without the user.localbase sysctl and avoids
allowing a system configuration detail to leak into the build. Users
who wish to build targeting a non-standard LOCALBASE should ensure it is
set in src.conf or similar.
Reviewed by: Mina Galić <freebsd@igalic.co>
Differential Revision: https://reviews.freebsd.org/D40530
Document the LOCALBASE variable and that it's set to user.localbase by
default. Update path defaults that depend on it.
Reviewed by: bcr
Differential Revision: https://reviews.freebsd.org/D40529
In my understanding ARC_BUF_SHARED() and arc_buf_is_shared() should
return identical results, except the second also asserts it deeper.
The first is much cheaper though, saving few pointer dereferences.
Replace production arc_buf_is_shared() calls with ARC_BUF_SHARED(),
and call arc_buf_is_shared() in random assertions, while making it
even more strict.
On my tests this in half reduces arc_buf_destroy_impl() time, that
noticeably reduces hash_lock congestion under heavy dbuf eviction.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15397
Torn reads/writes of dp_dirty_total are unlikely: on 64-bit systems
due to register size, while on 32-bit due to memory constraints.
And even if we hit some race, the code implementing the delay takes
the lock any way.
Removal of the poll-wide lock acquisition saves ~1% of CPU time on
8-thread 8KB write workload.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15390
Once we trigger the zpool scrub, all zpool/zfs command gets stuck for
180 seconds. Post 180 seconds zpool/zfs commands gets start executing
however few more seconds(10s) it take to update the status. hence
sleeping for 200 seconds so that we get the correct status.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: vaibhav.bhanawat <vaibhav.bhanawat@delphix.com>
Closes#15364
We already use ____cacheline_aligned in many places, so add one more
instead of seems arbitrary char tc_pad[8].
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15402
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Closes#15417
Variable 'uma_align_cache' has not been used since commit "FreeBSD: Use
a hash table for taskqid lookups" (3933305ea). Moreover, it is soon
going to become private to FreeBSD's UMA in 15.0-CURRENT (main),
14.0-STABLE (stable/14) and 13.2-STABLE (stable/13). Should accessing
this information become necessary again, one will have to use the new
accessors for recent versions.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olivier Certner <olce.freebsd@certner.fr>
Closes#15416
The hack with .xtmp file was effectively making the make to
ignore changes to the sources, breaking NO_CLEAN builds. The
hack can be actually omitted as setting SCRIPTSNAME_${_T} for
every test is sufficient to prevent renaming by bsd.prog.mk.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42283
When a vdev is to be expanded -- either via `zpool online -e` or via
the autoexpand option -- a SPA_ASYNC_CONFIG_UPDATE request is queued
to be handled via an asynchronous worker thread (spa_async_thread).
This normally happens almost immediately; but will be delayed up to
zfs_ccw_retry_interval seconds (default 5 minutes) if an attempt to
write the zpool configuration cache failed.
When FreeBSD boots ZFS-root VM images generated using `makefs -t zfs`,
the zpoolupgrade rc.d script runs `zpool upgrade`, which modifies the
pool configuration and triggers an attempt to write to the cache file.
This attempted write fails because the filesystem is still mounted
read-only at this point in the boot process, triggering a 5-minute
cooldown before SPA_ASYNC_CONFIG_UPDATE requests will be handled by
the asynchronous worker thread.
When expanding a vdev, reset the "when did a configuration cache
write last fail" value so that the SPA_ASYNC_CONFIG_UPDATE request
will be handled promptly. A cleaner but more intrusive option would
be to use separate SPA_ASYNC_ flags for "configuration changed" and
"try writing the configuration cache again", but with FreeBSD 14.0
coming very soon I'd prefer to leave such refactoring for a later
date.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colin Percival <cperciva@FreeBSD.org>
Closes#15405
The change is simple -- restore the original code so that the VDEV
path is updated when using by-id paths. The more challenging part
was to devise a second ZTS test, that would test auto-replace for
'by-id' and help prevent a future regression.
With that new test, we can now do an A|B test with , and without,
the fix to confirm that auto-replace for by-id paths works. The
existing auto-replace test, functional/fault/auto_replace_001_pos,
will confirm that we didn't break auto-replace for 'by-vdev' paths.
In the original functional/fault/auto_replace_001_pos test, the disk
wipe (using dd) was not effective in removing the partitioning since
the kernel was never informed of the wipe.
Added a call to wipefs(8) so that the kernel is informed and ZED will
re-partition the device.
Added a validation step that the re-partitioning occurred by
confirming that the GPT partition UUID changes.
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes#15363
To support Pointer Authentication (PAC) in assembly files add a pair of
macros that sign the link register. When used before storing to the
stack it will allow hardware to detect if it has changed before using
it in the return instruction.
Reviewed by: markj, emaste
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42226
To detect when an object file is build with the Branch Target
Identification (BTI) and Pointer Authentication Code (PAC) extensions
there is an elf note the compiler will insert. It will only do so from
a high level language, e.g. C or C++.
To get the not in assembly add the GNU_PROPERTY_AARCH64_FEATURE_1_NOTE
macro that can be used to create it, and the
GNU_PROPERTY_AARCH64_FEATURE_1_VAL macro to insert the correct value
based on which combination of BTI and PAC are enabled.
Reviewed by: markj (earlier version), emaste
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42225
Ubuntu 23.10 uses glibc 2.38. This adds strlcpy and strlcmp so we need
to remove them from the cross build environment.
Reviewed by: jrtc27 (earlier version), arichardson
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42303
When preprocessing assembly files with clang or gcc the __ASSEMBLER__
macro is defined. Check for this as an alternative to LOCORE in
elf_common.h so it can be included by .S files.
Reviewed by: imp
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42224
I would like to use this file as an example of the FREEBSD-upgrade
convention, see D42302. libarchive is picked somewhat arbitrarily as a
longstanding piece of contrib software in FreeBSD.
- Remove SVN references (HEAD/trunk)
- Mention the vendor/libarchive git branch
- Update link to import instructions
- Remove $FreeBSD$
Reviewed by: mm, imp, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42308
The driver has a tunable hw.xn.enable_lro which is intended to control
whether LRO is enabled. This is currently non-functional - even if its
set to zero, the driver still requests LRO support from the backend.
This change fixes the feature so that if enable_lro is set to zero, LRO
no longer appears in the interface capabilities and LRO is not requested
from the backend.
PR: 273046
MFC after: 1 week
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D41439
- If an untrusted certificate is also found in the list of trusted
certificate, issue a warning and skip it, but don't fail.
- Split on -+BEGIN CERTIFICATE-+ instead of "Certificate:" since
that's what we're really looking for.
Also fix a long-standing bug: .crl files are not certificates, so we
should not include them when searching for certificates.
Reported by: madpilot, netchild, tijl
Reviewed by: netchild, allanjude
Differential Revision: https://reviews.freebsd.org/D42276
It is observed that netvsc's send rings could stall on the latest
Azure Boost platforms. This is due to vmbus_rxbr_read() routine
doesn't check if host is waiting for more room to put data, which
leads to host side sleeping forever on this vmbus channel. The
problem was only observed on the latest platform because the host
requests larger buffer ring room to be available, which causes
the issue to happen much more easily.
Fix this by adding check in the vmbus_rxbr_read call and signaling
the host in the callers if check returns positively.
Reported by: NetApp
Tested by: whu
MFC after: 3 days
Sponsored by: Microsoft
The benefit is that in the debugger you will see PF_DIVERT_MTAG_DIR_IN
instead of 1 when looking at a structure. And compilation time failure
if anybody sets it to a wrong value. Using "port" instead of "ndir" when
assigning a port improves readability of code.
Suggested by: glebius
MFC after: 3 weeks
X-MFC-With: fabf705f4b