The debug assertions added in commit 34740937f7 do not hold for a
window in STAILQ_SWAP, so check whether the queues are empty directly.
Reported by: ler
Fixes: 34740937f7 ("queue: New debug macros for STAILQ")
Switch to INTR_ROOT_COUNT as this name better describes its purpose.
Remove the default INTR_ROOT_IRQ from the core. Define it (redundantly)
in each architecture's header, but now placed alongside its sibling
values (if defined by the platform, e.g. arm64 INTR_ROOT_FIQ).
Reviewed by: mhorne
Pull Request: https://github.com/freebsd/freebsd-src/pull/1280
This new system call allows to set all necessary credentials of
a process in one go: Effective, real and saved UIDs, effective, real and
saved GIDs, supplementary groups and the MAC label. Its advantage over
standard credential-setting system calls (such as setuid(), seteuid(),
etc.) is that it enables MAC modules, such as MAC/do, to restrict the
set of credentials some process may gain in a fine-grained manner.
Traditionally, credential changes rely on setuid binaries that call
multiple credential system calls and in a specific order (setuid() must
be last, so as to remain root for all other credential-setting calls,
which would otherwise fail with insufficient privileges). This
piecewise approach causes the process to transiently hold credentials
that are neither the original nor the final ones. For the kernel to
enforce that only certain transitions of credentials are allowed, either
these possibly non-compliant transient states have to disappear (by
setting all relevant attributes in one go), or the kernel must delay
setting or checking the new credentials. Delaying setting credentials
could be done, e.g., by having some mode where the standard system calls
contribute to building new credentials but without committing them. It
could be started and ended by a special system call. Delaying checking
could mean that, e.g., the kernel only verifies the credentials
transition at the next non-credential-setting system call (we just
mention this possibility for completeness, but are certainly not
endorsing it).
We chose the simpler approach of a new system call, as we don't expect
the set of credentials one can set to change often. It has the
advantages that the traditional system calls' code doesn't have to be
changed and that we can establish a special MAC protocol for it, by
having some cleanup function called just before returning (this is
a requirement for MAC/do), without disturbing the existing ones.
The mac_cred_check_setcred() hook is passed the flags received by
setcred() (including the version) and both the old and new kernel's
'struct ucred' instead of 'struct setcred' as this should simplify
evolving existing hooks as the 'struct setcred' structure evolves. The
mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always
called by pairs around potential calls to mac_cred_check_setcred().
They allow MAC modules to allocate/free data they may need in their
mac_cred_check_setcred() hook, as the latter is called under the current
process' lock, rendering sleepable allocations impossible. MAC/do is
going to leverage these in a subsequent commit. A scheme where
mac_cred_check_setcred() could return ERESTART was considered but is
incompatible with proper composition of MAC modules.
While here, add missing includes and declarations for standalone
inclusion of <sys/ucred.h> both from kernel and userspace (for the
latter, it has been working thanks to <bsm/audit.h> already including
<sys/types.h>).
Reviewed by: brooks
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47618
<sys/types.h> is currently brought in by other headers but some of its
type definition are directly used in this header, so it should appear
explicitly. It is necessary as lots of prototypes in there use types it
defines (<sys/_types.h> wouldn't be enough).
Sort header inclusions as per style(9).
No functional change (intended).
Approved by: markj (mentor)
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47881
As a process really changes credentials at the moment proc_set_cred() or
proc_unset_cred() is called, these functions are the proper locations to
perform the update of the new and old real users' process count (using
chgproccnt()).
Before this change, change_ruid() instead would perform that update,
although it operates only on a passed credential which is a priori not
tied to the calling process (or not to any process at all). This was
arguably a flaw of commit b1fc0ec1a7, r77183, based on its commit
message, and in particular the portion "(...) In each case, the call now
acts on a credential not a process (...)".
Fixing this makes using change_ruid() more natural when building
candidate credentials that in the end are not applied to a process,
e.g., because of some intervening privilege check. Also, it removes
a hack around this unwanted process count change in unionfs.
We also introduce the new proc_set_cred_enforce_proc_lim() so that
callers can respect the per-user process limit, and will use it for the
upcoming setcred(). We plan to change all callers of proc_set_cred() to
call this new function instead at some point. In the meantime, both
proc_set_cred() and the new function will coexist.
As detailed in some proc_set_cred_enforce_proc_lim()'s comment, checking
against the process limit is currently flawed as the kernel doesn't
really maintain the number of processes per UID (besides RLIMIT_NPROC,
this in fact also applies to RLIMIT_KQUEUES, RLIMIT_NPTS, RLIMIT_SBSIZE
and RLIMIT_SWAP). The applied limit is currently that of the old real
UID. Root (or a process granted with PRIV_PROC_LIMIT) is not subject to
this limit.
Approved by: markj (mentor)
Fixes: b1fc0ec1a7
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46923
This flag is used in field 'cr_flags', which is never directly visible
outside the kernel. That field is however exported through 'struct
kinfo_proc' objects (field 'ki_cr_flags'), either from the kernel via
sysctls or from libkvm, and is supposed to contain exported flags
prefixed with KI_CRF_ (currently, KI_CRF_CAPABILITY_MODE and
KI_CRF_GRP_OVERFLOW, this second one being a purely userland one
signaling overflow of 'ki_groups').
Make sure that KI_CRF_CAPABILITY_MODE is the flag actually exported and
tested by userland programs, and hide the internal CRED_FLAG_CAPMODE.
As both flags are currently defined to the same value, this doesn't
change the KBI, but of course does change the KPI. A code search via
GitHub and Google fortunately doesn't reveal any outside uses for
CRED_FLAG_CAPMODE.
While here, move assignment of 'ki_uid' to a more logical place in
kvm_proclist(), and definition of XU_NGROUPS as well in 'sys/ucred.h'
(no functional/interface changes intended).
Reviewed by: mhorne
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46909
Same as SYSCTL_JAIL_PARAM_SYS_NODE() but allowing another level of
hierarchy. To be used with MAC policies, so that they can have their
own node under "security.jail.param.mac".
Reviewed by: jamie
Approved by: markj (mentor)
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46901
This fixes inclusion from userspace, and also kernel inclusion for
source files not including explicitly nor implicitly <sys/types.h>.
Reviewed by: jamie
Approved by: markj (mentor)
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46896
The new STAILQ_ASSERT_EMPTY() macro allows callers to assert that some
STAILQ is empty. It leverages the new QMD_STAILQ_CHECK_EMPTY() internal
macro.
QMD_STAILQ_CHECK_EMPTY() is a check for empty STAILQ, where heads's
'stqh_last' field must point to the 'stqh_first' one. Use it in
STAILQ_ASSERT_EMPTY().
QMD_STAILQ_CHECK_TAIL() checks that the tail pointed by 'head' does not
have a next element. It is similar to the already existing
QMD_TAILQ_CHECK_TAIL(), but without the superfluous 'field' argument and
clearer documentation. Use it in STAILQ_INSERT_TAIL().
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46889
In order to match reality, allow using these functions with pointers on
const objects, and bring us closer to C11.
Remove the '+' modifier in the atomic_load_acq_64_i586()'s inline asm
statement's constraint for '*p' (the value to load). CMPXCHG8B always
writes back some value, even when the value exchange does not happen in
which case what was read is written back. atomic_load_acq_64_i586()
further takes care of the operation atomically writing back the same
value that was read in any case. All in all, this makes the inline
asm's write back undetectable by any other code, whether executing on
other CPUs or code on the same CPU before and after the call to
atomic_load_acq_64_i586(), except for the fact that CMPXCHG8B will
trigger a #GP(0) if the memory address is part of a read-only mapping.
This unfortunate property is however out of scope of the C abstract
machine, and in particular independent of whether the 'uint64_t' pointed
to is declared 'const' or not.
Approved by: markj (mentor)
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46887
This makes it clear that 'methods' (if not NULL) points to an array that
won't be modified. Internally, this array is indeed copied into the
given OSD type's larger array of all methods for all slots.
Reviewed by: jamie
Approved by: markj (mentor)
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46886
Note in the manpage that the 2024 edition finally added ppoll(), and
also add the appropriate declarations for the correct versions of
_POSIX_C_SOURCE (via __POSIX_VISIBLE).
Differential Revision: https://reviews.freebsd.org/D48043
I added a third value for kern.logsigexit to mean 'auto' as an abundance
of caution, but I don't know how much it matters -- that can be easily
consolidated back to boolean-ish.
This is primarily targeted towards people running test suites under CI
(e.g. buildbot, jenkins). Oftentimes tests entail segfaults that are
expected, and logs get spammed -- this can be particularly high volume
depending on the application. Per-process control of this behavior is
desirable because they may still want to be logging legitimate
segfaults, so the system-wide atomic bomb kern.logsigexit=0 is not a
great option.
This adds a process flag to disable it, controllable via
procctl(2)/proccontrol(1); the latter knows it as "sigexitlog" due to
its length, but it's referred to almost everywhere else as
"sigexit_log."
Reviewed by: kib (earlier version), pstef
Differential Revision: https://reviews.freebsd.org/D21903
Note in the manpage that the 2024 edition finally added ppoll(), and
also add the appropriate declarations for the correct versions of
_POSIX_C_SOURCE.
Differential Revision: https://reviews.freebsd.org/D48043
Summary: Export the kernel API pgrp_calc_jobc for use by other modules or functions.
Reviewed By: kib
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D47539
This was left over from before using atomic(9) consistently. It is
unneeded as these loads are already ordered correctly.
Reviewed by: alc, kib
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D48002
These were typo'd at creation, and they should've used the common
TDA_ prefix. Switch them now. No compat shims because these are niche
enough that it seems unlikely that they've seen large adoption under the
wrong name.
OK'd by: kib
Fixes: c6d31b83 ("AST: rework")
We no longer need a memory barrier on the compare and set operations.
If these fail then there are appropriate barriers on the loads earlier
in the loop, and if they succeed then no other threads can be accessing
the br_ring entry so any store to it is safe.
Reviewed by: alc
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D46381
This message will be included in any warning issued by the compiler
for use of the deprecated function.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D47701
These correspond to the current implementations of
bus_generic_(probe|attach|detach) but with more accurate names and
semantics. The intention is to deprecate bus_generic_(probe|attach)
and reimplement bus_generic_detach in a future commit.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D47673
Sometimes we need defines from this file in assembler code. Today we do
the heavyweight approach of using genassym for that. However, they are
just #defines, so in the future we want to include sys/intr.h to pick up
the needed constants in exception.S.
PR: 283041
Sponsored by: Netflix
Reviewed by: mmel, andrew
Differential Revision: https://reviews.freebsd.org/D47846
This is similar to chroot(2), but takes a file descriptor instead
of path. Same syscall exists in NetBSD and Solaris. It is part of a larger
patch to make absolute pathnames usable in Capsicum mode, but should
be useful in other contexts too.
Reviewed By: brooks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D41564
It may be the case that we want to avoid delivering signals that are
normally blocked by the thread's signal mask, in which case the syscall
should schedule this one instead to restore the mask prior to delivery.
This will be used by pselect/ppoll to avoid delivering signals that were
supposed to be blocked after the timeout has elapsed. The name was
chosen as this is the expected behavior of pselect/ppoll, while late
restoration of the mask is exceptional behavior for these specific
calls.
__FreeBSD_version bump as later TDA_* values have changed, third-party
modules that may be using MOD3/MOD4 need to be rebuilt.
Reviewed by: kib
Sponsored by: Klara, Inc.
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47741
The livedumper triggers reports from both of these sanitizers since it
necessarily accesses uninitialized or freed memory. Add a flag to
silence reports from both sanitizers.
Reviewed by: mhorne, khng
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47714
This block has a lot of nesting, not helped by two adjacent nested
blocks involving _POSIX_C_SOURCE, with only the inner one commented,
looking like it's the end of the outer one. Comment the outer one as
well so it's not quite so hard to figure out.
MFC after: 1 week
Disable the IP/IP6/ICMP/... counter probe points by default.
They are kept enabled in debug builds, and can be enabled with
'options KDTRACE_MIB_SDT'.
Requested by: glebius
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47657
struct cdev has members of type struct timespec. Include sys/_timespec.h
to so we don't need to rely on namespace pollution to define it.
Sponsored by: Netflix
The jobid (which was stored in kernelinfo) was used to look up
jobs until 1ce9182407, where it became essentially write only.
Remove it to simplify the code and pave the way for future work
to make aio scale better.
Note this has been slated for removal "soon" for 18 years.
Suggested by: jhb
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D47583
Various System V Interface Definition editions, as well as the X/Open
group portability guide issue 4, recommend defining _XOPEN_SOURCE and
broadly intimating it means the same thing as _POSIX_SOURCE == 2.
Starting in X/Open issue 5 (1995), _XOPEN_SOURCE needs to be defined to
be 500 to bring in the newer interfaces. However, it is still common hat
sources define _XOPEN_SOURCE to be blank. To deal with that, we subtract
0 from _XOPEN_SOURCE to make the other expressions well formed.
While here, document that we should define _POSIX_C_SOURCE to be 199209
based on the SVID, the first version of the Single Unix Specification,
and X/Open CAE issue 4, version 2. Also document that historically this
has been a NOP. Any value of _XOPEN_SOURCE < 500 (including it being
blank) was not viewed as a request for a restricted namespace.
Reviewed by: brooks (earlier version)
Differential Revision: https://reviews.freebsd.org/D47584
Sponsored by: Netflix
Bump default to POSIX at 202405, C at 2023 and xopen at 800...
Sponsored by: Netflix
Reviewed by: brooks
Differential Revision: https://reviews.freebsd.org/D47578