Commit graph

1969 commits

Author SHA1 Message Date
Alan Somers
022ca2fc7f Add aio_writev and aio_readv
POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.

It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.

Reviewed by:    jhb, kib, bcr
Relnotes:       yes
Differential Revision: https://reviews.freebsd.org/D27743
2021-01-02 19:57:58 -07:00
Rick Macklem
d189a74dfd copy_file_range(2): add recommendation to use large "len"
PR#252358 reported a serious performance problem w.r.t.
cp(1) when copying large non-sparse files.
This problem appears to have been caused by cp(1)
calling copy_file_range(2) with a small "len" argument.

This patch adds a recommendation to use a large "len"
value where possible, for performance reasons.

Reviewed by:	asomers
Differential Revision:	https://reviews.freebsd.org/D27935
2021-01-02 17:21:21 -08:00
Konstantin Belousov
58b2ed4672 eventfd.2: Add the mail address of the submitter into copyright.
Requested by:	rgrimes
MFC after:	13 days
2020-12-28 21:03:16 +02:00
Konstantin Belousov
6d075fd9a5 Document eventfd().
Submitted by:   greg@unrelenting.technology
Reviewed by:    bcr, markj (previous version)
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D26668
2020-12-27 12:57:26 +02:00
Konstantin Belousov
0ef405eee9 kqueue(2): Use .Fo instead .Ft
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2020-12-27 12:57:26 +02:00
Konstantin Belousov
44c5db52e2 Add eventfd(3) wrappers to libc.
eventfd_read/write one-liners are from musl libc.

Submitted by:   greg@unrelenting.technology
Reviewed by:    markj (previous version)
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D26668
2020-12-27 12:57:26 +02:00
Guangyuan Yang
a1d7836752 mmap(2): Update .Dd missed in the last commit
PR:		252097
MFC after:	1 week
2020-12-24 14:14:56 +00:00
Guangyuan Yang
81720dbab2 mmap(2): Fix a typo
PR:             252097
MFC after:      1 week
Reported by:    Nick Frampton <nick.frampton@akips.com>
2020-12-24 14:08:34 +00:00
Gordon Bergling
f6d234d870 libc: Fix most issues reported by mandoc
- varios "new sentence, new line" warnings
- varios "sections out of conventional order" warnings
- varios "unusual Xr order" warnings
- varios "missing section argument" warnings
- varios "no blank before trailing delimiter" warnings
- varios "normalizing date format" warnings

MFC after:	1 month
2020-12-19 14:54:28 +00:00
Enji Cooper
0c424c64f8 cpuset{,_getaffinity,_getdomain}.2: fix SEE ALSO
Sort by manpage section, then sort entries alphabetically.

This makes the manpages `make manlint` clean.

MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-11 01:52:27 +00:00
Enji Cooper
92d4164179 aio_suspend.2: properly canonicalize .Dd
Months should be fully spelled as their local-specific equivalents: in this
case `Oct` should have been spelled like `October`.

Reported by:	make manlint
MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-11 00:28:28 +00:00
Enji Cooper
20daf0ca6e cap_enter(2): fix CAVEATS section
The CAVEATS section was misspelled as "CAVEAT" before this change. Fix the
spelling to identify issues related to the section.

Furthermore, given that the section order was incorrect, move the CAVEATS
section down to the bottom of the manpage, per the conventional section
order.

MFC after:	1 week
Reported by:	make manlint
Sponsored by:	DellEMC Isilon
2020-12-11 00:26:49 +00:00
Kyle Evans
e04a83a3e1 _umtx_op(2): document recent addition of 32bit compat flags
This was part of D27325.

Reviewed by:	kib
2020-12-09 03:20:51 +00:00
Enji Cooper
00107a56e5 extattr_get_file(20: bump .Dd
This is being done for the formatting and context changes. While the net content
hasn't been changed, the content/context changes were sufficient to warrant the
date bump.

MFC after:	1 week
MFC with:	r368431, r368433, r368434, r368435
Sponsored by:	DellEMC Isilon
2020-12-08 04:18:16 +00:00
Enji Cooper
cf681016d4 extattr_get_file(2): clarify RETURN VALUES
While some of the syscalls' behavior were documented and implied in the
RETURN VALUES section by earlier, e.g., the DESCRIPTION sections, as having
behavior of the other calls (`*_fd` vs `*_file` vs `*_link`), there was a lot
of implied return value behavior in the section prior to this change.

Explicitly document the syscall behavior per the current implementation in
sys/kern/vfs_extattr.c so others can better develop based on its explicit
documented behavior instead of having to digest the context of the manpage to
understand the appropriate behavior.

MFC after:	1 week
MFC with:	r368431, r368433, r368434
Sponsored by:	DellEMC Isilon
2020-12-08 04:16:05 +00:00
Enji Cooper
f705523939 extattr_get_file(2): fix more formatting
- Remove an unnecessary trailing comma separating a two-item clause.
- Sort more function calls alphabetically (in the same vein as r368433).

MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-08 04:05:19 +00:00
Enji Cooper
e8e0f91b8b extattr_get_file(2): sort syscalls alphabetically
Although some sections of the manpage sort the syscalls alphabetically, many
core areas of the manpage do not. Sort the syscalls so it is easier to pick out
functional changes and to improve manpage readability.

This formatting change is also being done to make future functional changes
easier to spot.

MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-08 04:01:03 +00:00
Enji Cooper
403b2124d4 lio_listio(2): fix manlint error
The date with .Dd prior to this change isn't canonically spelled out: it
should have been "December", not "Dec".

MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-08 03:48:05 +00:00
Enji Cooper
9d610b1516 extattr_get_fd(2): fix manlint errors
- The CAVEATS section was misspelled as "CAVEAT".
- The CAVEATS section should come before the "BUGS" section and after
  other existing sections by convention.

MFC after:	1 week
Reported by:	make manlint
Sponsored by:	DellEMC Isilon
2020-12-08 03:43:00 +00:00
Kyle Evans
231f59920a _umtx_op: document UMTX_OP_SEM2_WAIT copyout behavior
This clever technique to get a time remaining back was added to support sem_clockwait_np.

Reviewed by:	kib, vangyzen
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27160
2020-11-17 03:26:56 +00:00
Edward Tomasz Napierala
bce7ee9d41 Drop "All rights reserved" from all my stuff. This includes
Foundation copyrights, approved by emaste@.  It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.

Reviewed by:	emaste, imp, gbe (manpages)
Differential Revision:	https://reviews.freebsd.org/D26980
2020-10-28 13:46:11 +00:00
Alan Cox
76c7af51ab Revise the description of MAP_STACK. In particular, describe the guard
in more detail.

Reviewed by:	bcr, kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D26908
2020-10-27 18:08:33 +00:00
John-Mark Gurney
0fda26dbb3 update write(2)'s iovec limit w/ info about the iosize_max_clamp sysctl... 2020-10-26 00:37:31 +00:00
Konstantin Belousov
69c09181d4 mmap(2): Document guard size for MAP_STACK and related EINVAL.
Based on submission by:	emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26894
2020-10-21 21:40:33 +00:00
Gordon Bergling
3d265fce43 Fix a few mandoc issues
- skipping paragraph macro: Pp after Sh
- sections out of conventional order: Sh EXAMPLES
- whitespace at end of input line
- normalizing date format
2020-10-09 19:12:44 +00:00
Warner Losh
61c4a6f317 Updates to chroot(2) docs
1. Note what settings give historic behavior
2. Recommend jail under security considerations.
2020-09-29 18:13:54 +00:00
Konstantin Belousov
1f305be431 Document {O,AT}_RESOLVE_BENEATH and new O_BENEATH behavior for relative paths.
PR:	248335
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:54:54 +00:00
Mark Johnston
3d1098617b Fix error checking in shm_create_largepage().
Reviewed by:	alc, kib
Reported by:	Coverity
MFC with:	r365524
Differential Revision:	https://reviews.freebsd.org/D26464
2020-09-18 12:30:15 +00:00
Kyle Evans
8b8cf4ece6 memfd_create: simplify HUGETLB support a little bit
This also fixes a minor issue that was missed in the initial review; the
layout of the MFD_HUGE_* flags is actually not 1:1 bit:flag -- it instead
borrowed the Linux convention of how this is laid out since it was
originally implemented on Linux, the top 6 bits represent the shift required
for the requested page size.

This allows us to remove the flag <-> pgsize mapping table and simplify the
logic just prior to validation of the requested page size.

While we're here, fix two small nits:

- HUGETLB memfd shouldn't exhibit the SHM_GROW_ON_WRITE behavior. We can
  only grow largepage shm by appropriately aligned (i.e. requested pagesize)
  sizes, so it can't work in the typical/sane fashion. Furthermore, Linux
  does the same, so let's be compatible.

- We don't allow MFD_HUGETLB without specifying a pagesize, so no need to
  check for that later.

Reviewed by:	kib (slightly earlier version)
2020-09-11 02:02:15 +00:00
Kyle Evans
9bf2b80ca6 memfd_create: fix return values
Literally returning EINVAL from a function designed to return an fd makes
for interesting scenarios.

I cannot assign enough pointy hats to cover this one.
2020-09-10 21:25:16 +00:00
Kyle Evans
944174e7bf Fix memfd_create tests after r365524
r365524 did accidentally invert this check that sets SHM_LARGEPAGE, leading
non-hugetlb memfd as unconfigured largepage shm and thus test failures when
we try to ftruncate or write to them.

PR:		249236
Discussed with:	kib
2020-09-10 17:23:30 +00:00
Konstantin Belousov
3ef55e8f25 Add shm_create_largepage(3) helper for creation and configuration of
largepage shm objects.

And since we can, add memfd_create(MFD_HUGETLB) support, hopefully
close enough to the Linux feature.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-09 22:20:36 +00:00
Kyle Evans
69112cca60 getlogin_r: fix the type of len
getlogin_r is specified by POSIX to to take a size_t len, not int. Fix our
version to do the same, bump the symbol version due to ABI change and
provide compat.

This was reported to break compilation of Ruby 2.8.

Some discussion about the necessity of the ABI compat did take place in the
review. While many 64-bit platforms would likely be passing it in a 64-bit
register and zero-extended and thus, not notice ABI breakage, some do
sign-extend (e.g. mips).

PR:		247102
Submitted by:	Bertram Scharpf <software@bertram-scharpf.de> (original)
Submitted by:	cem (ABI compat)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D26335
2020-09-09 18:07:13 +00:00
Mark Johnston
847ab36bf2 Include the psind in data returned by mincore(2).
Currently we use a single bit to indicate whether the virtual page is
part of a superpage.  To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.

The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl.  MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.

For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.

Reviewed by:	alc, kib
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26238
2020-09-02 18:16:43 +00:00
Alan Somers
e3f1731aec [skip ci] document close_range(2) as async-signal-safe
Reviewed by:	bcr (manpages)
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25513
2020-07-21 16:46:40 +00:00
Gordon Bergling
e7677232d6 lseek(2): Document the seek behavior better and update the POSIX compliance
In certain situations lseek(2) will return successful although if no seek
was performed. This can happen when operating on devices that don't support
seeking (older tape drives) or when operating on changeable media devices
(such as DVD or Blu-ray devices) without a medium inserted.

Document this within the man page and update the POSIX compliance while here.

PR:		162765
Submitted by:	arundel@
Reported by:	arundel@
Reviewed by:	bcr (mentor)
Approved by:	bcr (mentor)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25646
2020-07-13 15:52:57 +00:00
Allan Jude
0e3972bc19 procctl(2): consistently refer to the last agrument as 'data'
Some older references called it 'arg'

Also fix a syntax error that was underlining an entire sentence.

PR:		247386
Reported by:	Paul Floyd <paulf@free.fr>, PauAmma (research)
MFC after:	2 weeks
Sponsored by:	Klara Inc.
2020-07-11 18:04:09 +00:00
Kyle Evans
423a033ba7 memfd_create: turn on SHM_GROW_ON_WRITE
memfd_create fds will no longer require an ftruncate(2) to set the size;
they'll grow (to the extent that it's possible) upon write(2)-like syscalls.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25502
2020-07-10 00:45:16 +00:00
Mark Johnston
fe59cb6ba2 Apply the logic from r363051 to semctl(2) and __sem_base field.
Reported by:	Jeffball <jeffball@grimm-co.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 18:34:54 +00:00
Mark Johnston
f4f16af1d3 Avoid copying out kernel pointers from msgctl(IPC_STAT).
While this behaviour is harmless, it is really just an artifact of the
fact that the msgctl(2) implementation uses a user-visible structure as
part of the internal implementation, so it is not deliberate and these
pointers are not useful to userspace.  Thus, NULL them out before
copying out, and remove references to them from the manual page.

Reported by:	Jeffball <jeffball@grimm-co.com>
Reviewed by:	emaste, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 17:26:49 +00:00
Warner Losh
f045cfb816 Chroot actually appeared in 7th Edition Unix.
Chroot appeared during the development of 7th edition Unix. The FreeBSD jail
documents, incorrectly, that Bill Joy added this to 4.2BSD on 18 March
1982. That was when Bill Joy converted from a statically coded system call glue
to dynamically generated assembler. Chroot was present in 32V, 3BSD, 4.0BSD, 4.1BSD
and 4.1cBSD well in advance of this. Kirk McKusick agrees with this analysis.

See also:
	V7: https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/sys/chroot.s
	32V: https://minnie.tuhs.org/cgi-bin/utree.pl?file=32V/usr/src/libc/sys/chroot.s
	3BSD: https://minnie.tuhs.org/cgi-bin/utree.pl?file=3BSD/usr/src/libc/sys/chroot.s
	4BSD: https://minnie.tuhs.org/cgi-bin/utree.pl?file=4BSD/usr/src/libc/sys/chroot.s
	4.1cBSD: https://minnie.tuhs.org/cgi-bin/utree.pl?file=4.1cBSD/usr/src/libc/sys/chroot.s

The 6th and earlier editions do not have this system call, nor do they have
anything named chroot in the trees available from TUHS.

Reviewed by: allanjude@
Differential Revision: https://reviews.freebsd.org/D25475
2020-06-26 22:05:23 +00:00
Pawel Biernacki
eb8a06388c man page of select(2) should mention pselect(2)
Reviewed by:	bcr (manpages), kib, trasz
Approved by:	kib (mentor)
MFC after:	7 days
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D25169
2020-06-25 12:31:05 +00:00
Mateusz Piotrowski
3b3e9cfb1b Fix a typo in cpuset_getdomain.2
PR:		247385
Reported by:	Paul Floyd <paulf free.fr>
MFC after:	1 week
2020-06-18 19:03:20 +00:00
Gordon Bergling
421f325efc libcasper(3): Document HISTORY within the manpages
Reviewed by:	bcr (mentor)
Approved by:	bcr (mentor)
MFC after:		7 days
Differential Revision:	https://reviews.freebsd.org/D24695
2020-06-16 16:48:52 +00:00
Gordon Bergling
e0f7c06de2 libc manpages: various improvements from NetBSD
- Add STANDARDS and HISTORY sections within the appropriate manpages
- Mention two USENIX papers within kqueue(2) and strlcpy(3)

Reviewed by:	bcr (mentor)
Approved by:	bcr (mentor)
Obtained from:	NetBSD
MFC after:	7 days
Differential Revision: https://reviews.freebsd.org/D24650
2020-06-14 05:59:30 +00:00
Konstantin Belousov
6cf8fba381 procctl(2): document PROC_KPTI
Reviewed by:	bcr
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25258
2020-06-13 18:19:42 +00:00
Konstantin Belousov
7e54fea1d1 procctl(2): consistently refer to the data pointer as 'data'.
Reviewed by:	bcr
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25258
2020-06-13 18:18:34 +00:00
Kyle Evans
63619b6dba vfs: add restrictions to read(2) of a directory [2/2]
This commit adds the priv(9) that waters down the sysctl to make it only
allow read(2) of a dirfd by the system root. Jailed root is not allowed, but
jail policy and superuser policy will abstain from allowing/denying it so
that a MAC module can fully control the policy.

Such a MAC module has been written, and can be found at:
https://people.freebsd.org/~kevans/mac_read_dir-0.1.0.tar.gz

It is expected that the MAC module won't be needed by many, as most only
need to do such diagnostics that require this behavior as system root
anyways. Interested parties are welcome to grab the MAC module above and
create a port or locally integrate it, and with enough support it could see
introduction to base. As noted in mac_read_dir.c, it is released under the
BSD 2 clause license and allows the restrictions to be lifted for only
jailed root or for all unprivileged users.

PR:		246412
Reviewed by:	mckusick, kib, emaste, jilles, cy, phk, imp (all previous)
Reviewed by:	rgrimes (latest version)
Differential Revision:	https://reviews.freebsd.org/D24596
2020-06-04 18:17:25 +00:00
Kyle Evans
dcef4f65ae vfs: add restrictions to read(2) of a directory [1/2]
Historically, we've allowed read() of a directory and some filesystems will
accommodate (e.g. ufs/ffs, msdosfs). From the history department staffed by
Warner: <<EOF

pdp-7 unix seemed to allow reading directories, but they were weird, special
things there so I'm unsure (my pdp-7 assembler sucks).

1st Edition's sources are lost, mostly. The kernel allows it. The
reconstructed sources from 2nd or 3rd edition read it though.

V6 to V7 changed the filesystem format, and should have been a warning, but
reading directories weren't materially changed.

4.1b BSD introduced readdir because of UFS. UFS broke all directory reading
programs in 1983. ls, du, find, etc all had to be rewritten. readdir() and
friends were introduced here.

SysVr3 picked up readdir() in 1987 for the AT&T fork of Unix. SysVr4 updated
all the directory reading programs in 1988 because different filesystem
types were introduced.

In the 90s, these interfaces became completely ubiquitous as PDP-11s running
V7 faded from view and all the folks that initially started on V7 upgraded
to SysV. Linux never supported this (though I've not done the software
archeology to check) because it has always had a pathological diversity of
filesystems.
EOF

Disallowing read(2) on a directory has the side-effect of masking
application bugs from relying on other implementation's behavior
(e.g. Linux) of rejecting these with EISDIR across the board, but allowing
it has been a vector for at least one stack disclosure bug in the past[0].

By POSIX, this is implementation-defined whether read() handles directories
or not. Popular implementations have chosen to reject them, and this seems
sensible: the data you're reading from a directory is not structured in some
unified way across filesystem implementations like with readdir(2), so it is
impossible for applications to portably rely on this.

With this patch, we will reject most read(2) of a dirfd with EISDIR. Users
that know what they're doing can conscientiously set
bsd.security.allow_read_dir=1 to allow read(2) of directories, as it has
proven useful for debugging or recovery. A future commit will further limit
the sysctl to allow only the system root to read(2) directories, to make it
at least relatively safe to leave on for longer periods of time.

While we're adding logic pertaining to directory vnodes to vn_io_fault, an
additional assertion has also been added to ensure that we're not reaching
vn_io_fault with any write request on a directory vnode. Such request would
be a logical error in the kernel, and must be debugged rather than allowing
it to potentially silently error out.

Commented out shell aliases have been placed in root's chsrc/shrc to promote
awareness that grep may become noisy after this change, depending on your
usage.

A tentative MFC plan has been put together to try and make it as trivial as
possible to identify issues and collect reports; note that this will be
strongly re-evaluated. Tentatively, I will MFC this knob with the default as
it is in HEAD to improve our odds of actually getting reports. The future
priv(9) to further restrict the sysctl WILL NOT BE MERGED BACK, so the knob
will be a faithful reversion on stable/12. We will go into the merge
acknowledging that the sysctl default may be flipped back to restore
historical behavior at *any* point if it's warranted.

[0] https://www.freebsd.org/security/advisories/FreeBSD-SA-19:10.ufs.asc

PR:		246412
Reviewed by:	mckusick, kib, emaste, jilles, cy, phk, imp (all previous)
Reviewed by:	rgrimes (latest version)
MFC after:	1 month (note the MFC plan mentioned above)
Relnotes:	absolutely, but will amend previous RELNOTES entry
Differential Revision:	https://reviews.freebsd.org/D24596
2020-06-04 18:09:55 +00:00
Ed Maste
3f65edb369 mmap.2: correct prot argument terminology
One of the error descriptions referred to permissions; in context the
meaning was probably clear, but the prot values are properly called
protections.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-06-03 20:42:52 +00:00