All of the kern_* prototypes belong in this header. While here, sort
the prototypes by function name.
Reviewed by: dchagin
Fixes: 6453d4240f vfs: Export exattr methods to reuse by Linuxulator
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D41766
(cherry picked from commit 3555be0124a4f105c72d932f00071f332691e8cf)
As of LLVM 16, the -fsanitize-memory-param-retval option is set to true
by default, meaning that MSan will eagerly report uninitialized function
parameters and return values, even if they are not used. A
witness_save()/witness_restore() call pair fails this test since
witness_save() may return before saving file and line number
information.
Modify witness_save() to initialize the out-params unconditionally; this
appears to be the only instance of the problem triggered when booting to
a login prompt, so let's just address it directly.
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
MFC after: 1 week
(cherry picked from commit 7123222220aa563dc16bf1989d335722e4ff57a6)
In the LinuxKPI, PAGE_MASK is the logical negation of FreeBSD's
PAGE_MASK, so the original assertion was simply incorrect.
Reported by: trasz
Tested by: trasz
Fixes: 6223d0b67af9 ("linuxkpi: Handle direct-mapped addresses in linux_free_kmem()")
(cherry picked from commit f88bd1174aab1aff7fea7241ab6e103e769d2d7a)
See the analysis in PR 271333. It is possible for driver code to
allocate a page, store its address as returned by page_address(), then
call free_page() on that address. On most systems that'll result in the
LinuxKPI calling kmem_free() with a direct-mapped address, which is not
legal.
Fix the problem by making linux_free_kmem() check the address to see
whether it's direct-mapped or not, and handling it appropriately.
PR: 271333, 274515
Reviewed by: hselasky, bz
Tested by: trasz
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40028
(cherry picked from commit 6223d0b67af923f53d962a9bf594dc37004dffe8)
Make sure that we don't try to copy with a negative resid.
Make sure that we don't walk off the end of the iovec array.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42098
(cherry picked from commit 8fd0ec53deaad34383d4b344714b74d67105b258)
Accesses to KMSAN's TLS block are not instrumented, so there's no need
to use kmsan_memset(). No functional change intended.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
(cherry picked from commit e5caed14067b40f1454d74e99789a28508d0eea3)
Let pmap_enter_l2() create wired mappings. In particular, allocate a
leaf PTP for use during demotion. This is a step towards reverting
commit 64087fd7f3.
Reviewed by: alc, markj
Sponsored by: Google, Inc. (GSoC 2023)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41634
(cherry picked from commit 808f5ac3c6dcbe38f505c0c843b0a10ae154e6ec)
If a request ends up growing beyong the initially allocated space the
netlink functions (such as snl_add_msg_attr_u32()) will allocate a
new buffer. This invalidates the header pointer we can have received
from snl_create_msg_request(). Always use the hdr returned by
snl_finalize_msg().
Reviewed by: melifaro
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42223
(cherry picked from commit 4f8f43b06ed07e96a250855488cc531799d5b78f)
The following loader tunables do have corresponding sysctl MIBs but
with inconsistent naming. That may be historical reason. Let's prefer
consistent naming for them so that it will be easier to maintain.
1. hw.dmar.timeout -> hw.iommu.dmar.timeout
2. hw.lapic_eoi_suppression -> hw.apic.eoi_suppression
3. hw.lapic_tsc_deadline -> hw.apic.timer_tsc_deadline
4. hw.x2apic_enable -> hw.apic.x2apic_mode
Those tunables are for field debugging, no need to keep old names for
compatibility.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42248
(cherry picked from commit 12cce5994b92f8235f379d660ccb28da8e69f55b)
The sysctl knob 'vm.pmap.allow_2m_x_ept' is loader tunable and have
public document entry in security(7) but is fetched from kernel
environment 'hw.allow_2m_x_ept'. That is inconsistent and obscure.
As there is public security advisory FreeBSD-SA-19:25.mcepsc [1],
people may refer to it and use 'hw.allow_2m_x_ept', let's keep old
name for compatibility.
[1] https://www.freebsd.org/security/advisories/FreeBSD-SA-19:25.mcepsc.asc
Reviewed by: kib
Fixes: c08973d09c Workaround for Intel SKL002/SKL012S errata
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42311
(cherry picked from commit 9e7f349ff10691c2e3fb03898dbc942794a47566)
The following loader tunables do have corresponding sysctl MIBs but
with different names. That may be historical reason. Let's prefer
consistent naming for them so that it will be easier to read and
maintain.
1. hw.vmm.l1d_flush -> hw.vmm.vmx.l1d_flush
2. hw.vmm.l1d_flush_sw -> hw.vmm.vmx.l1d_flush_sw
3. hw.vmm.vmx.use_apic_pir -> hw.vmm.vmx.cap.posted_interrupts
4. hw.vmm.vmx.use_apic_vid -> hw.vmm.vmx.cap.virtual_interrupt_delivery
5. hw.vmm.vmx.use_tpr_shadowing -> hw.vmm.vmx.cap.tpr_shadowing
Old names are kept for compatibility.
Meanwhile, add sysctl flag CTLFLAG_TUN to them so that `sysctl -T` will
report them correctly.
Reviewed by: corvink, jhb, kib, #bhyve
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D42251
(cherry picked from commit f3ff0918ffcdbcb4c39175f3f9be70999edb14e8)
The sysctl knob 'vm.pmap.pv_entry_max' becomes a loader tunable since
7ff48af704 (Allow a specific setting for pv entries) but is fetched
from system environment 'vm.pmap.pv_entries'. That is inconsistent and
obscure.
This reverts 36e1b9702e (Correct the tunable name in the message).
PR: 231577
Reviewed by: jhibbits, alc, kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42274
(cherry picked from commit 02320f64209563e35fa371fc5eac94067f688f7f)
There's no symbol named 'mac_veriexec_get_executable_flags', the right
one should be the function 'mac_veriexec_metadata_get_executable_flags()'.
Reviewed by: stevek
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42133
(cherry picked from commit f34c9c4e3bdc2b8bffae4ac26897e0e847e9f76f)
#15405 ea30b5a9e Set spa_ccw_fail_time=0 when expanding a vdev
Fixes 5-minute autoexpand delay on ZFS-root VM images.
Requested by: cperciva
Co-Authored-By: Colin Percival <cperciva@FreeBSD.org>
Obtained from: OpenZFS
OpenZFS commit: ea30b5a9e0d266baa13398ed8f9435de050f4b25
This is a temporary solution to fix PR before release.
During 15.0 it's necessary to refactor symlinks handling
between vfs & namecache.
PR: 273414
Reported by: Vincent Milum Jr, Dan Kotowski, glebius
Tested by: Dan Kotowski, glebius
Reviewed by:
Differential Revision: https://reviews.freebsd.org/D41806
MFC after: 3 days
(cherry picked from commit bb8ecf259f96510b9c2146d846403393543061b7)
To match the sysctl MIBs and document entries in security(7).
Fixes: 2dec2b4a34 amd64: flush L1 data cache on syscall return with an error
Fixes: 17edf152e5 Control for Special Register Buffer Data Sampling mitigation
Reviewed by: kib
MFC after: 1 day
Differential Revision: https://reviews.freebsd.org/D42249
(cherry picked from commit afbb8041a0633c97acb51ac895c9ae3cde4fe540)
This patch fixes UTF-8 sequence validation logic in
teken_utf8_bytes_to_codepoint() and fixes fallback behaviour in
ttydisc_rubchar() when an invalid UTF8 sequence is encountered. The code
previously used __bitcount() to extract sequence length information from
the leading byte. However, this assumption breaks for certain code
points that have additional bits set in the first half of the leading
byte (e.g. Cyrillic characters). This lead to incorrect behaviour when
deleting those characters using backspaces. The code now checks the
number of consecutive set bits in the leading byte starting from the
MSB, as per RFC 3629.
Reviewed by: christos
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D42147
(cherry picked from commit 2fed1c579c52d63b72fc08ffcc652ba0183f9254)
The use of bitcount() triggered a build error because it couldn't be
located. __bitcount() on the other hand is defined in sys/types.h, which
is included in teken/teken.h.
MFC after: 2 weeks
(cherry picked from commit 6d3296f16a06bcaa49918799e683936711dcf9c9)
This patch adds additional logic in ttydisc_rubchar() to properly handle
backspace behaviour for UTF-8 characters.
Currently, typing in a backspace after a UTF8 character will delete only
one byte from the byte sequence, leaving garbled output in the tty's
output queue. With this change all of the character's bytes are deleted.
This change is only active when the IUTF8 flag is set (see
19054eb6053189144aa962b2ecc1bf5087758a3e "(s)tty: add support for IUTF8
input flag")
The code uses the teken_wcwidth() function to properly handle character
column widths for different code points, and adds the
teken_utf8_bytes_to_codepoint() function that converts a UTF-8 byte
sequence to a codepoint, as specified in RFC3629.
Reported by: christos
Reviewed by: christos, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D42067
(cherry picked from commit 9e589b0938579f3f4d89fa5c051f845bf754184d)
This patch adds the necessary kernel and stty code to support setting
the IUTF8 flag for ttys. It is the first of two patches that fix
backspace behaviour for UTF-8 encoded characters when in canonical mode.
Reported by: christos
Reviewed by: christos, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D42066
(cherry picked from commit 128f63cedc14ae21b35f74e11e2fe1a5659c58e8)
At least KMSAN relies on zero-initialization of AP PCPU regions, see
commit 4b136ef259.
Prior to commit af1c6d3f30 these were allocated with allocpages() in
the amd64 pmap, which always returns zero-initialized memory.
Reviewed by: kib
Fixes: af1c6d3f30 ("amd64: do not leak pcpu pages")
MFC after: 3 days
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D42241
(cherry picked from commit a37e484d049758c70f2d61be0d28a115b6f2f01e)
When recvmsg(2) is used with MSG_TRUNC on an atomic socket type (DGRAM
or SEQPACKET), soreceive_generic() and uipc_peek_dgram() may
intentionally underflow uio_resid so that userspace can find out how
many bytes it should have asked for.
If this happens, and KTR_GENIO is enabled, ktrgenio() will attempt to
copy in beyond the end of the output buffer's iovec. In general this
will silently cause the ktrace operation to fail since it'll result in
EFAULT from uiomove(). Let's be more careful and make sure not to try
and copy more bytes than we have.
Fixes: be1f485d7d ("sockets: add MSG_TRUNC flag handling for recvfrom()/recvmsg().")
Reported by: syzbot+30b4bb0c0bc0f53ac198@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42099
(cherry picked from commit 761ae1ce798add862d78728cc5ac5240ce7db779)
As of LLVM 16, -fsanitize-memory-param-retval is the default. It yields
significantly smaller code, but the KMSAN runtime interceptors need to
be updated to stop checking shadow state of parameters. Apply a minimal
workaround for now.
MFC after: 3 days
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
(cherry picked from commit b6c653c9746342b373af01979319b3cb123b2872)
It is observed that netvsc's send rings could stall on the latest
Azure Boost platforms. This is due to vmbus_rxbr_read() routine
doesn't check if host is waiting for more room to put data, which
leads to host side sleeping forever on this vmbus channel. The
problem was only observed on the latest platform because the host
requests larger buffer ring room to be available, which causes
the issue to happen much more easily.
Fix this by adding check in the vmbus_rxbr_read call and signaling
the host in the callers if check returns positively.
Reported by: NetApp
Tested by: whu
Sponsored by: Microsoft
(cherry picked from commit 49fa9a64372b087cfd66459a20f4ffd25464b6a3)
We must consider the aarch32 FP registers as 16 128bits registers, and store
that as the first 16 aarch64 FP registers.
PR: 267788
(cherry picked from commit ccd0f34d8585cba727dd17a381309855af655b82)
Move creation of watchdog process from just before we configure the
interrupt config hook to into the config hook itself. This prevents it
from racing the config intr hook and doing an extra reset of the
card. This extra reset is usually harmless, but sometimes it can prevent
discovery of devices if done at just the wrong time. This can lead to no
disks being registered in a box full of disks, for example. Starting it
later eliminates this race, making discovery reliable.
Reviewed by: imp
(cherry picked from commit 7e02c7074c4c6df77b860e0dbcd032a2ea04b98b)
Problem: Under certain I/O conditions, a program doing large block disk
reads can cause a controller to crash.
Root Cause: The SCSI read request and destination address in the BDMA
descriptor is incorrect, causing the BDMA engine in the controller to
assert.
Fix: Change the alignment for creating bus_dma_tags in the driver from
PAGE_SIZE (4k) to 1, which allows the controller to manage it's own
address range for BDMA transactions.
Risk: Medium
Exposure: This reverts a change first made to support NVMe drives on
Excalibur. At that time a 4k alignment was necessary. This no longer
seems to be the case.
PR: 259541
Reported by: Ka Ho Ng <khng@freebsd.org>
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D41619
(cherry picked from commit f07b267d8cc87e88be3c78aa69504b5ebc6571ee)
pqisrc_free_device frees the device softc with the os spinlock
held. This causes crashes when devices are removed because the memory
free might sleep (which is prohibited with spin locks held). Drop the
spinlock before releasing the memory.
MFC After: 2 days
PR: 273289
Reviewed by: imp
(cherry picked from commit b064a4c9eed5b1dd2a40fc4fd2cb7e738b681547)
The loader tunable 'vm.numa.disabled' does not have corresponding sysctl
MIB entry. Add it so that it can be retrieved, and `sysctl -T` will also
report it correctly.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42138
(cherry picked from commit c415cfc8be1b732a80f1ada6d52091e08eeb9ab5)
The loader tunable 'vm.pgcache_zone_max_pcpu' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42138
(cherry picked from commit a55fbda874db31b804490567c69502c891b6ff61)
The loader tunable 'debug.kmsan.disabled' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42138
(cherry picked from commit 1d2b743784f7527a6840fe35ddb7e34cd41bc17a)
The loader tunable 'debug.kasan.disabled' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42138
(cherry picked from commit db5d0bc868be669ed6588ebeccf8c02e76aabc41)
The loader tunable 'kern.boottrace.table_size' does not have
corresponding sysctl MIB entry. Add it so that it can be retrieved,
and `sysctl -T` will also report it correctly.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42138
(cherry picked from commit 51dc362d1a148362dc4cfacaa3629db928523204)
This addresses the issues of pf_rule_times leaking in case of stateless
rules and in case of state creation failures, like hitting the state
limit.
Reviewed by: kp
MFC after: 1 week
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D42169
(cherry picked from commit 4d19eceaefb7106d761bc9504bb0da737ae0d674)
cr_canseeotheruids(), cr_canseeothergids() and cr_canseejailproc()
should not be used directly now. cr_bsd_visible() has to be called
instead.
Reviewed by: mhorne
Sponsored by: Kumacom SAS
Differential Revision: https://reviews.freebsd.org/D40629
(cherry picked from commit 91e9d669b475d1900e8dc01a49ad90a621c4a068)
UEFI v2.10 Section 5.3 documentes that the minimum reserved space after
the GPT header be at least 16kB. Enforce this minimum. Before, we'd only
set the number of entries to be the unpadded size. gpart's selective
enforcement of aspects of the GPT standard meant that these images would
work, but couldn't be changed (to add a partition or grow the size of a
partition). This ensures that gpart's overly picky standards don't cause
problems for people wishing to, for example, resize release images.
MFC after: 1 day (we want this in 14.0)
PR: 274312
Sponsored by: Netflix
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D42245
(cherry picked from commit 9b42d3e12ffc6896fcb4e60c1b239ddf60705831)
It was a bad idea to have composite clock directly managing gates.
All clocks drivers have been rewrite to not use this functionality
and directly export the gate. We can now remove this code.
(cherry picked from commit db34f02028f30bbf099bf1bce7ce66184f51b332)