Reviewed by: Alexander Ziaee <concussious@runbox.com>
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48063
(cherry picked from commit b6f4027ad9a2ede69a7ec11137cc4ea69ec2f0a0)
This new system call allows to set all necessary credentials of
a process in one go: Effective, real and saved UIDs, effective, real and
saved GIDs, supplementary groups and the MAC label. Its advantage over
standard credential-setting system calls (such as setuid(), seteuid(),
etc.) is that it enables MAC modules, such as MAC/do, to restrict the
set of credentials some process may gain in a fine-grained manner.
Traditionally, credential changes rely on setuid binaries that call
multiple credential system calls and in a specific order (setuid() must
be last, so as to remain root for all other credential-setting calls,
which would otherwise fail with insufficient privileges). This
piecewise approach causes the process to transiently hold credentials
that are neither the original nor the final ones. For the kernel to
enforce that only certain transitions of credentials are allowed, either
these possibly non-compliant transient states have to disappear (by
setting all relevant attributes in one go), or the kernel must delay
setting or checking the new credentials. Delaying setting credentials
could be done, e.g., by having some mode where the standard system calls
contribute to building new credentials but without committing them. It
could be started and ended by a special system call. Delaying checking
could mean that, e.g., the kernel only verifies the credentials
transition at the next non-credential-setting system call (we just
mention this possibility for completeness, but are certainly not
endorsing it).
We chose the simpler approach of a new system call, as we don't expect
the set of credentials one can set to change often. It has the
advantages that the traditional system calls' code doesn't have to be
changed and that we can establish a special MAC protocol for it, by
having some cleanup function called just before returning (this is
a requirement for MAC/do), without disturbing the existing ones.
The mac_cred_check_setcred() hook is passed the flags received by
setcred() (including the version) and both the old and new kernel's
'struct ucred' instead of 'struct setcred' as this should simplify
evolving existing hooks as the 'struct setcred' structure evolves. The
mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always
called by pairs around potential calls to mac_cred_check_setcred().
They allow MAC modules to allocate/free data they may need in their
mac_cred_check_setcred() hook, as the latter is called under the current
process' lock, rendering sleepable allocations impossible. MAC/do is
going to leverage these in a subsequent commit. A scheme where
mac_cred_check_setcred() could return ERESTART was considered but is
incompatible with proper composition of MAC modules.
While here, add missing includes and declarations for standalone
inclusion of <sys/ucred.h> both from kernel and userspace (for the
latter, it has been working thanks to <bsm/audit.h> already including
<sys/types.h>).
Reviewed by: brooks
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47618
(cherry picked from commit ddb3eb4efe55e57c206f3534263c77b837aff1dc)
The SUS doesn't mention this error code as a possible one [1]. The FreeBSD
manual page specifies a possible ECONNRESET for close(2):
[ECONNRESET] The underlying object was a stream socket that was
shut down by the peer before all pending data was
delivered.
In the past it had been EINVAL (see 21367f630d), and this EINVAL was
added as a safety measure in 623dce13c6. After conversion to
ECONNRESET it had been documented in the manual page in 78e3a7fdd5, but
I bet wasn't ever tested to actually be ever returned, cause the
tcp-testsuite[2] didn't exist back then. So documentation is incorrect
since 2006, if my bet wins. Anyway, in the modern FreeBSD the condition
described above doesn't end up with ECONNRESET error code from close(2).
The error condition is reported via SO_ERROR socket option, though. This
can be checked using the tcp-testsuite, temporarily disabling the
getsockopt(SO_ERROR) lines using sed command [3]. Most of these
getsockopt(2)s are followed by '+0.00 close(3) = 0', which will confirm
that close(2) doesn't return ECONNRESET even on a socket that has the
error stored, neither it is returned in the case described in the manual
page. The latter case is covered by multiple tests residing in tcp-
testsuite/state-event-engine/rcv-rst-*.
However, the deleted block of code could be entered in a race condition
between close(2) and processing of incoming packet, when connection had
already been half-closed with shutdown(SHUT_WR) and sits in TCPS_LAST_ACK.
This was reported in the bug 146845. With the block deleted, we will
continue into tcp_disconnect() which has proper handling of INP_DROPPED.
The race explanation follows. The connection is in TCPS_LAST_ACK. The
network input thread acquires the tcpcb lock first, sets INP_DROPPED,
acquires the socket lock in soisdisconnected() and clears SS_ISCONNECTED.
Meanwhile, the syscall thread goes through sodisconnect() which checks for
SS_ISCONNECTED locklessly(!). The check passes and the thread blocks on
the tcpcb lock in tcp_usr_disconnect(). Once input thread releases the
lock, the syscall thread observes INP_DROPPED and returns ECONNRESET.
- Thread 1: tcp_do_segment()->tcp_close()->in_pcbdrop(),soisdisconnected()
- Thread 2: sys_close()...->soclose()->sodisconnect()->tcp_usr_disconnect()
Note that the lockless operation in sodisconnect() isn't correct, but
enforcing the socket lock there will not fix the problem.
[1] https://pubs.opengroup.org/onlinepubs/9799919799/
[2] https://github.com/freebsd-net/tcp-testsuite
[3] sed -i "" -Ee '/\+0\.00 getsockopt\(3, SOL_SOCKET, SO_ERROR, \[ECONNRESET\]/d' $(grep -lr ECONNRESET tcp-testsuite)
PR: 146845
Reviewed by: tuexen, rrs, imp
Differential Revision: https://reviews.freebsd.org/D48148
(cherry picked from commit 053a988497342a6fd0a717cc097d09c23f83e103)
- Use a typical tagged list for the open flags instead of a literal
block. This permits using markup in the flag descriptions. Also,
drop the offset to avoid indenting the entire list.
- Note that O_RESOLVE_BENEATH only applies to openat(2)
- Use a clearer description of O_CLOEXEC (what it means, not the
internal flag it sets)
- Note that exactly one permission flag is required.
- Split up a paragraph on various flags so that each flag gets its own
paragraph. Some flags already had their own paragraph, so this is
more consistent. It also makes it clearer which flag a sentence is
talking about when a flag has more than one sentence.
- Appease some errors from igor and man2ps
- In the discussion about a returned directory descriptor opened with
O_SEARCH, avoid the use of Fa fd since the descriptor in question is
a return value and not an argument to open or openat.
- Various and sundry markup and language tweaks
Reviewed by: kib, emaste
Differential Revision: https://reviews.freebsd.org/D48253
(cherry picked from commit 826509a3c3642db6a110f8f43ae8860c40c72ad2)
- Use consistent language to describe user values unchanged by the
kernel.
- Replace passive language with active in a few places.
- Add a history note for kqueuex() and kqueue1().
- Add an MLINK and synopsis for kqueue1().
- Various wording and markup tweaks.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D48203
(cherry picked from commit 9b1585384d53d0f9cc4585a6efd8cc95116407d7)
Clarify the RETURN VALUES section with improved structure,
the condition of the return value 0, and the setting of errno.
PR: 174581
Reviewed by: jhb, ziaee
Approved by: mhorne (mentor)
Differential Revision: https://reviews.freebsd.org/D48955
(cherry picked from commit 571df2c64a3c1af1fe011303ec08e391e887ecbc)
The SO_SETFIB option can be used to set a socket's FIB number, but there
is no way to retrieve it. Rename SO_SETFIB to SO_FIB and implement a
handler for it for getsockopt(2).
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48834
(cherry picked from commit ee951eb59f2136a604e3fbb12abf8d8344da0c99)
len is unsigned (it is size_t), so cannot be negative.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit fab411c4fd5224e3dd44e0eb288d60b27480e2d1)
We previously claimed that non-page-aligned addresses would return
EINVAL, but the address is in fact rounded down to the page boundary.
Reported by: Harald Eilertsen <haraldei@anduin.net>
Reviewed by: brooks
Sponsored by: The FreeBSD Foundation
Fixes: dabee6fecc ("kern_descrip.c: add fdshare()/fdcopy()")
Differential Revision: https://reviews.freebsd.org/D48465
(cherry picked from commit 9e36aaf0c24cf158e83c69c1d2312c000c3c36f3)
- Add some missing .Pp macros after the end of literal blocks and some
lists to ensure there is a blank line before the following text.
- Use an indent of Ds for nested lists to reduce excessive indentation and
make the bodies of the nested list items easier to read.
- Various and sundry rewordings and clarifications.
Reviewed by: kib, emaste
Differential Revision: https://reviews.freebsd.org/D47782
(cherry picked from commit 8277c790179304159c2e4dcb1d99552518d5be8e)
These were reported by `mandoc -T lint` as
ERROR: skipping unknown macro
When these pages were rendered with `man`, the "unknown macro" meant
that the entire line was omitted from the output.
Obvious typos in:
lib/libsys/swapon.2
lib/libsys/procctl.2
share/man/man9/firmware.9
lib/libcasper/services/cap_net/cap_net.3: 'mode' describes a function
argument.
lib/libsys/statfs.2: there's no .Tm command ("trademark?"), and
.Tn ("tradename") is deprecated, so remove the macro entirely.
usr.sbin/mfiutil/mfiutil.8: man was interpreting '/dev/' as a macro
(which it didn't recognize).
share/man/man4/qat.4: same issue as above, but with '0'. In this case,
given the context of the previous line, rewriting as "Value '0'"
seemed more appropriate.
usr.sbin/mlx5tool/mlx5tool.8: typo in .Xr
Signed-off-by: Graham Percival <gperciva@tarsnap.com>
Sponsored by: Tarsnap Backup Inc.
Reviewed by: concussious, imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1417
(cherry picked from commit 2878d99dfcfbdd7a415a7f31cf95fbd53fc8e581)
Moved from libsys to libc for stable/14.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47556
(cherry picked from commit 36887e04947fedfebb9b648fadd0dd6cc03142ea)
With this patch, it is possible to call fchmod() on a unix socket prior
to binding it to the filesystem namespace, so that the mode is set
atomically. Without this, one has to call chmod() after bind(), leaving
a window where threads can connect to the socket with the default mode.
After bind(), fchmod() reverts to failing with EINVAL.
This interface is copied from Linux.
The behaviour of fstat() is unmodified, i.e., it continues to return the
mode as set by soo_stat().
PR: 282393
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D47361
(cherry picked from commit bfd03046d18776ea70785ca1ef36dfc60822de3b)
Also remove some information from HISTORY that is no longer needed (and
could be confusing), now that _Fork is part of a standard.
Reported by: kib
Reviewed by: imp, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47588
(cherry picked from commit 566c039d1e7555343fcf6439a10e56f5a632c0fe)
struct kld_file_stat embeds a reference to MAXPATHLEN, defined in
param.h.
PR: 280432
MFC after: 2 weeks
(cherry picked from commit f44029e322446469f116bbd26d51ba857083bacb)
These were reported by `mandoc -T lint ...` as warnings:
- unusual Xr order
- unusual Xr punctuation
Fixes made by script in https://github.com/Tarsnap/freebsd-doc-scripts
Signed-off-by: Graham Percival <gperciva@tarsnap.com>
Reviewed by: mhorne, Alexander Ziaee <concussious.bugzilla@runbox.com>
Sponsored by: Tarsnap Backup Inc.
Pull Request: https://github.com/freebsd/freebsd-src/pull/1464
(cherry picked from commit 6e1fc0118033f42b7c0d3623c8f67a89ebecabb2)
These were reported by `mandoc -T lint ...` as errors.
fhlink.2, fhreadlink.2: remove unneeded block closing.
getfh.2, procctl.2: add necessary block closing.
ptrace.2: -width only takes one argument.
swapon.2: <sys/vmparam.h> and <vm/swap_pager.h> weren't being displayed,
because .It is for a list item whereas .In is for included files.
Also, we want a blank line between <sys/ > headers and the other
one.
Signed-off-by: Graham Percival <gperciva@tarsnap.com>
PR: 281597
Reviewed by: mhorne
Sponsored by: Tarsnap Backup Inc.
(cherry picked from commit 650056363baddb83c61c85b0539ee536f3d4b56c)
Add a minimal membarrier man page that documents the available cmd
values and errors that can be returned. We can add more information and
iterate on it in the tree.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46967
(cherry picked from commit 1fc766e3b41d0cdbd166ef95258434069a90ca52)
(cherry picked from commit 92cd5abb64dd70c305535c9504c6a2b73552147f)
(cherry picked from commit 8b41e693fc3956385d5771d60ee93e18001a5a0d)
This is a feature which allows one to splice two TCP sockets together
such that data which arrives on one socket is automatically pushed into
the send buffer of the spliced socket. This can be used to make TCP
proxying more efficient as it eliminates the need to copy data into and
out of userspace.
The interface is copied from OpenBSD, and this implementation aims to be
compatible. Splicing is enabled by setting the SO_SPLICE socket option.
When spliced, data that arrives on the receive buffer is automatically
forwarded to the other socket. In particular, splicing is a
unidirectional operation; to splice a socket pair in both directions,
SO_SPLICE needs to be applied to both sockets. More concretely, when
setting the option one passes the following struct:
struct splice {
int fd;
off_t max;
struct timveval idle;
};
where "fd" refers to the socket to which the first socket is to be
spliced, and two setsockopt(SO_SPLICE) calls are required to set up a
bi-directional splice.
select(), poll() and kevent() do not return when data arrives in the
receive buffer of a spliced socket, as such data is expected to be
removed automatically once space is available in the corresponding send
buffer. Userspace can perform I/O on spliced sockets, but it will be
unpredictably interleaved with splice I/O.
A splice can be configured to unsplice once a certain number of bytes
have been transmitted, or after a given time period. Once unspliced,
the socket behaves normally from userspace's perspective. The number of
bytes transmitted via the splice can be retrieved using
getsockopt(SO_SPLICE); this works after unsplicing as well, up until the
socket is closed or spliced again. Userspace can also manually trigger
unsplicing by splicing to -1.
Splicing work is handled by dedicated threads, similar to KTLS. A
worker thread is assigned at splice creation time. At some point it
would be nice to have a direct dispatch mode, wherein the thread which
places data into a receive buffer is also responsible for pushing it
into the sink, but this requires tighter integration with the protocol
stack in order to avoid reentrancy problems.
Currently, sowakeup() and related functions will signal the worker
thread assigned to a spliced socket. so_splice_xfer() does the hard
work of moving data between socket buffers.
Co-authored by: gallatin
Reviewed by: brooks (interface bits)
MFC after: 3 months
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D46411
(cherry picked from commit a1da7dc1cdad8c000622a7b23ff5994ccfe9cac6)
Make the system call honor `AT_SYMLINK_NOFOLLOW`.
Also enable this from `linux_faccessat2` where the issue arised the first time.
Update manual pages accordingly.
PR: 275295
Reported by: kenrap@kennethraplee.com
Approved by: kib@
Differential Revision: https://reviews.freebsd.org/D46267
(cherry picked from commit 5ab6ed93cd3680f8b69dd4d05823f4740a2bdef9)
access(), eaccess() and faccessat() will always dereference
symbolic links.
So add a note in the manual page, that lstat(2) should be
used in the case of symbolic links.
PR: 262895
Reviewed by: gbe, pauamma_gundo.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44890
(cherry picked from commit 421025a274fb5759b3ecc8bdb30b24db830b45ae)
The CLOCK_* constants are "defined variable or preprocessor constants"
and so use .Dv.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45106
(cherry picked from commit 2d29d2ecebf8ea19221995b3ea2e3a7ac700bf81)
As kib@ noted:
> Obviously gettimeofday(2) is not going to be removed
> even in the far future.
Reported by: kib
Fixes: 4395d3ced5cf Document that gettimeofday() is obsolescent
MFC after: 3 days
(cherry picked from commit 6662c2312e956439652ce2d06b42753b6a78fc61)
Change .Xr reference to .Fn, which quiets a mandoc warning.
Reviewed by: mhorne
MFC after: 3 days
Pull Request: https://github.com/freebsd/freebsd-src/pull/1135
(cherry picked from commit d3de1bd429bc51fbbcb37fadaf2581461edf848b)
In the errno list, add an explicit note and reference to the note in the
STANDARDS section.
When O_NOFOLLOW is specified and the target is a symbolic link FreeBSD
sets errno to a value different than that specified by POSIX. Commit
295159dfa3 added a note to this effect, but I missed it when reading
through the list of errno values.
PR: 214633
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43618
(cherry picked from commit ea6a6b63e1fd304e790c8ed7627caf5e3ba52bc7)