Two different functions in different files do the same thing - fill a
partial page with zeroes. Add that functionality to vm_page.c and
remove it elsewhere to avoid code duplication.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D49096
Two different functions in different files do the same thing - fill a
partial page with zeroes. Add that functionality to vm_page.c and
remove it elsewhere to avoid code duplication.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D49096
Certain NFSv4.1 callbacks are not currently supported/used
by the FreeBSD client. Without this patch, NFS4ERR_NOTSUPP
is replied for the callbacks. Since NFSv4.1 does not specify
all of these callbacks as optional, I think it is preferable
to reply NFS_OK or NFS4ERR_REJECT_DELEG instead of NFS4ERR_NOTSUPP.
This patch changes the reply status for these unsupported
callbacks, which the client has no use for.
I am not aware of any NFSv4.1 servers that will perform
any of these callbacks against the FreeBSD client at this time.
MFC after: 2 weeks
Commit f5aff1871d and 7e26f1c210 moved the delegation
and layout high water variables into the clientID structure.
This patch uses those variables to implement the
CB_RECALL_ANY NFSv4.1/4.2 callback.
This patch only affects NFSv4.1/4.2 mounts to non-FreeBSD
NFS servers that use CB_RECALL_ANY. The Linux knfsd is
one example of such a server.
MFC after: 2 weeks
Commit f5aff1871d moved the delegation high water
variables into the clientID structure, so that they are now
per mount instead of global. This patch does the
same for the layout highwater variables. It happens
that the layout highwater variables are not actually
used. This patch changes the code to use them.
This is needed to add support
for the CB_RECALL_ANY callback in a future commit.
This patch only affects NFSv4.1/4.2 mounts with the "pnfs"
mount option. The effect on these mounts will be minimal,
since layouts are returned when they are stale and this
normally ensures that the highwater mark is never hit.
MFC after: 2 weeks
Without this patch, the variables used to maintain a high
water limit for delegations are global and apply to all
mounts. This patch moves them into the clientID structure,
which makes them per mount. This is needed to add support
for the CB_RECALL_ANY callback in a future commit.
The only effect of this patch is an increase in the
total number of delegations held if there are multiple NFSv4
mounts to NFSv4 servers with delegations enabled.
Since the default of NFSCLDELEGHIGHWATER is fairly small,
this should not have a significant impact.
MFC after: 2 weeks
The callback CB_RECALL_SLOT is required for NFSv4.1/4.2.
Fortunately, there does not appear to be any extant
NFSv4.1/4.2 servers that use it. Since commit b97a478896
fixed handling of session slot shrinking, this patch
adds support for CB_RECALL_SLOT, which shrinks the
number of session slots as well.
MFC after: 2 weeks
When a NFSv4.1/4.2 server reduces the size of the slot table
for a session as indicated by a smaller value for sr_target_highest_slot
in a Sequence reply, the sequence numbers for the slots no
longer in use must be re-initialized. This is needed since the
slot table may be grown again by the server later.
The RFC did not make the need for the sequence numbers to be
re-initialized when a shrink/grow of the slot table size happens,
but this has now been confirmed as correct behaviour.
The patch adds the code that does this re-initialization.
I am not currently aware of a NFSv4.1/4.2 server where the
session slots fail if this is not done, but there may be such
a case.
MFC after: 2 weeks
This seems to be the right place to set it once and for all, without
setting it deep in kgssapi/rpctls/etc leaf functions.
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D48558
Richard Kojedzinszky reported an intermittent problem where
the Linux NFSv4.2 client would sometimes not see changes done
to a directory by another client, although the change attribute
for the directory had changed.
A test patch that added the change_attr_type attribute to the
server and always returned NFS4_CHANGE_TYPE_VERSION_COUNTER_NOPNFS
seems to have resolved the issue. Somewhat oddly, the Linux
knfsd server does not support this attribute but does not
seem to exhibit the stale caching problem.
This patch uses the VFCF_FILEREVINC flag on a file system (UFS, ZFS)
to return NFS4_CHANGE_TYPE_VERSION_COUNTER_NOPNFS. It also
returns NFS4_CHANGE_TYPE_TIME_METADATA if VFCF_FILEREVCT is set,
which may be useful for exported fuse file systems.
PR: 284186
Reported by: Richard Kojedzinszky <richard@kojedz.in>
Tested by: Richard Kojedzinszky <richard@kojedz.in>
MFC after: 2 weeks
The deleg argument to nfscl_deleg() is a "struct nfscldeleg **"
although the returned pointer value is never used by callers.
This patch changes the argument to "struct nfscldeleg *" to
simplify the call and avoid any confusion w.r.t. use of the
returned value.
This patch should not create any NFS semantics change.
The grace time of 2 minutes plus when the nfsd is started
is needed for normal operation. It allows client(s) to
recovery open/lock state. However, for testing situations
where there are no client(s) to recover state, it introduces
an unacceptable delay.
The new per-vnet jail sysctl can be set non-zero to disable
the grace period. It should only be used for testing and
can be applied on a per-jail basis. It must be set before
the nfsd is started up.
Requested by: asomers
Tested by: asomers
Commit 026cdaa3b3 added a check for a nul or "/" in a file
name in a readdir reply. Unfortunately, the minimal testing
done on it did not detect a bug that can cause the client
to crash.
This patch fixes the code so that it does not crash.
Note that a NFS server will not normally return a file
name in a readdir reply that has a nul or "/" in it,
so the crash is unlikely.
PR: 283965
Reported by: asomers
Tested by: asomers
MFC after: 2 weeks
Fix a leak of a fuse_ticket structure. The leak mostly affected
NFS-exported fuse file systems, and was triggered by a failure during
FUSE_LOOKUP.
MFC after: 2 weeks
Sponsored by: ConnectWise
Change 'struct tmpfs_fid_data' to behave consistently with the private
structure other FSes use. In a nutshell, make it a full alias of
'struct fid', instead of just using it to fill 'fid_data'. This implies
adding a length field at start (aliasing 'fid_len' of 'struct fid'), and
filling 'fid_len' with the full size of the aliased structure.
To ensure that the new 'struct tmpfs_fid_data' is smaller than 'struct
fid', which the compile-time assert introduced in commit
91b5592a1e ("fs: Add static asserts for the size of fid
structures") checks (and thus was not strong enough when added), use
'__packed'.
A consequence of this change is that copying the 'struct tmpfs_fid_data'
into a stack-allocated variable becomes unnecessary, we simply rely on
the compiler emitting the proper code on seeing '__packed' (and on the
start of 'struct tmpfs_fid_data' being naturally aligned, which is
normally guaranteed by kernel's malloc() and/or inclusion in 'struct
fhandle').
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47956
Flags should not propagate from the lower fs. Behavior for the upper fs
is determined by flags from its mount point structure. When lower fs
acts according to its mount configuration, it is reported up as VOP
errors.
PR: 283425
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D48150
The FUSE_NO_OPEN_SUPPORT and FUSE_NO_OPENDIR_SUPPORT flags
are only meant to indicate kernel features, and should be ignored
if they appear in the FUSE_INIT reply flags.
Also fix the corresponding test cases.
MFC after: 2 weeks
Reviewed by: Alan Somers <asomers@FreeBSD.org>
Signed-off-by: CismonX <admin@cismon.net>
Pull Request: https://github.com/freebsd/freebsd-src/pull/1509
As a process really changes credentials at the moment proc_set_cred() or
proc_unset_cred() is called, these functions are the proper locations to
perform the update of the new and old real users' process count (using
chgproccnt()).
Before this change, change_ruid() instead would perform that update,
although it operates only on a passed credential which is a priori not
tied to the calling process (or not to any process at all). This was
arguably a flaw of commit b1fc0ec1a7, r77183, based on its commit
message, and in particular the portion "(...) In each case, the call now
acts on a credential not a process (...)".
Fixing this makes using change_ruid() more natural when building
candidate credentials that in the end are not applied to a process,
e.g., because of some intervening privilege check. Also, it removes
a hack around this unwanted process count change in unionfs.
We also introduce the new proc_set_cred_enforce_proc_lim() so that
callers can respect the per-user process limit, and will use it for the
upcoming setcred(). We plan to change all callers of proc_set_cred() to
call this new function instead at some point. In the meantime, both
proc_set_cred() and the new function will coexist.
As detailed in some proc_set_cred_enforce_proc_lim()'s comment, checking
against the process limit is currently flawed as the kernel doesn't
really maintain the number of processes per UID (besides RLIMIT_NPROC,
this in fact also applies to RLIMIT_KQUEUES, RLIMIT_NPTS, RLIMIT_SBSIZE
and RLIMIT_SWAP). The applied limit is currently that of the old real
UID. Root (or a process granted with PRIV_PROC_LIMIT) is not subject to
this limit.
Approved by: markj (mentor)
Fixes: b1fc0ec1a7
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46923
Unusually, the FUSE_NOTIFY_INVAL_INODE and FUSE_NOTIFY_INVAL_ENTRY
messages are fully asynchronous. The server sends them to the kernel
unsolicited. That means that unlike every other fuse message coming
from the server, these two arrive to a potentially unbusied mountpoint.
So they must explicitly busy it. Otherwise a page fault could result if
the mountpoint were being unmounted.
Reported by: JSML4ThWwBID69YC@protonmail.com
MFC after: 2 weeks
Re-ordering the fields suppresses the trailing padding which was causing
the structure to overflow 'struct fid'.
While here, re-indent in a more visually pleasing way.
Reviewed by: rmacklem, emaste, markj
Approved by: markj (mentor)
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D47955
As the 'gen' field in 'struct tarfs_node' (and then 'struct tarfs_fid')
is filled with arc4random() which returns an unsigned int, change its
type in both structures. This allows reordering fields in 'struct
tarfs_fid' to reduce its size, finally avoiding the use of '__packed' to
ensure it fits into 'struct fid'.
While here, remove the 'data0' field which wasn't necessary from the
start.
Reviewed by: markj, rmacklem, des
Approved by: markj (mentor)
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D47954
This commit upgrades the FUSE API to protocol 7.32.
It doesn't implement any of protocol 7.32's new features.
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D48040
This permits the mask bits to control the upper 3 bits used for setuid,
setgid, and sticky permissions. While here, clarify the manpage language
as non-Rockridge volumes with extended attributes can also supply users
and groups along with permissions.
Reviewed by: olce
Fixes: 82f2275b73 cd9660: Add support for mask,dirmask,uid,gid options
Differential Revision: https://reviews.freebsd.org/D47357
- cd_ino_t can be dropped since ino_t is now 64 bits wide.
- ISOFSMNT_ROOT is unused (and defined only for the kernel).
No functional change intended.
Reviewed by: olce, imp, kib, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47880
File system specific *fid structures are copied into the generic
struct fid defined in sys/mount.h.
As such, they cannot be larger than struct fid.
This patch packs the structure and checks via a __Static_assert().
Reported by: Kevin Miller <mas@0x194.net>
Reviewed by: olce, imp, kib, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47879
File system specific *fid structures are copied into the generic
struct fid defined in sys/mount.h.
As such, they cannot be larger than struct fid.
This patch packs the structure and checks via a __Static_assert().
Reviewed by: markj
MFC after: 2 weeks
File system specific *fid structures are copied into the generic
struct fid defined in sys/mount.h.
As such, they cannot be larger than struct fid.
This patch packed the structure and checks via a __Static_assert().
Reviewed by: markj
MFC after: 2 weeks
File system specific *fid structures are copied into the generic
struct fid defined in sys/mount.h.
As such, they cannot be larger than struct fid.
This patch adds _Static_assert()s to check for this.
ZFS and fuse already have _Static_assert()s.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47936
Without this patch, an all upper case user domain name
(as specified by nfsuserd(8)) would not work.
I believe this was done so that Kerberos realms were
not confused with user domains.
Now, RFC8881 specifies that the user domain name is a
DNS name. As such, all upper case names should work.
This patch fixes this case so that it works. The custom
comparison function is no longer needed.
PR: 282620
Tested by: jmmv
MFC after: 2 weeks
This will fail when mac_veriexec is enforced.
Move the check from procfs_doprocmem to proc_rwmem to ensure all
cases are covered.
Reviewed by: olce, markj
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D47484
REMOVE doesn't work properly in the face of hard links. Use UNLINKAT
instead, which is implemented by qemu and bhyve and lets the client
specify the name being removed.
PR: 282432
Reviewed by: dfr
Differential Revision: https://reviews.freebsd.org/D47438
We cannot unconditionally access nfsd's VNET variables in
'sys/kern/vfs_export.c' nor 'sys/fs/nfsserver/nfs_nfsdsubs.c', as they
may not have been compiled in depending on build options.
So, forget about the extra mile of using the configured default group
and use the hardcoded GID_NOGROUP (which differs only on systems running
nfsuserd(8) and with a non-default GID for their "nogroup" group).
Reported by: rpokala, bapt (MINIMAL compile breakup)
Reported by: cy, David Wolfskill (panics caused by mountd(8))
Approved by: markj (mentor)
Fixes: cfbe7a62dc ("nfs, rpc: Ensure kernel credentials have at least one group")
This fixes several bugs where some 'struct ucred' in the kernel,
constructed from user input (via nmount(2)) or obtained from other
servers (e.g., gssd(8)), could have an unfilled 'cr_groups' field and
whose 'cr_groups[0]' (or 'cr_gid', which is an alias) was later
accessed, causing an uninitialized access giving random access rights.
Use crsetgroups_fallback() to enforce a fallback group when possible.
For NFS, the chosen fallback group is that of the NFS server in the
current VNET (NFSD_VNET(nfsrv_defaultgid)).
There does not seem to be any sensible fallback available in rpc code
(sys/rpc/svc_auth.c, svc_getcred()) on AUTH_UNIX (TLS or not), so just
fail credential retrieval there. Stock NSS sources, rpc.tlsservd(8) or
rpc.tlsclntd(8) provide non-empty group lists, so will not be impacted.
Discussed with: rmacklem (by mail)
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D46918
Some M_EXTPG mbufs are read-only (e.g. those backing sendfile
requests), but others are not. Add a flags argument to
mb_alloc_ext_pgs that can be used to set M_RDONLY when needed rather
than setting it unconditionally. Update mb_unmapped_to_ext to
preserve M_RDONLY from the unmapped mbuf.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D46783
Two functions in tmpfs_vnops.c use an interface provided by
swap_pager.c. Move most of the implementation of those functions to
swap_pager.c so that they can be implemented more effectively, with
access to implementation details of the swap pager.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D47212
There has been a documented case in the exports(5) man
page forever, which specifies that the -maproot or -mapall
may have a single user entry, followed by a ':'.
This case is defined as specifying no groups (aka cr_ngroups == 0).
This patch fixes the NFS server so that it handles this case correctly.
After MFC'ng this patch to stable/13 and stable/14, I propose that
this unusual case be deprecated and no longer allowed in FreeBSD15.
At that point, this patch can be reverted.
Reviewed by: brooks
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D47204
tmpfs_seek_data_locked should return the offset of the first page
either resident in memory or in swap, but may return an offset to a
nonresident page. Check for residence to fix that.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D46879
nfsrv_freeopen() was being called after the mutex
lock was released, making it possible for other
kernel threads to change the lists while nfsrv_freeopen()
took the nfsstateid out of the lists.
This patch moves the code around
"if (nfsrv_freeopen(stp, vp, 1 p) == 0) {"
into nfsrv_freeopen(), so that it can remove the nfsstateid
structure from all lists before unlocking the mutex.
This should avoid any race between CLOSE and other nfsd threads
updating the NFSv4 state.
The patch does not affect semantics when vfs.nfsd.enable_locallocks=0.
PR: 280978
Tested by: Matthew L. Dailey <matthew.l.dailey@dartmouth.edu>
MFC after: 1 week
This reverts commit 9792c7d3eb.
The email thread "panic: nfsv4root ref cnt cpuid=1"
on freebsd-fs@freebsd.org descibes
crashes that occurred for a NFSv4.1 client mount
using "oneopenown" where the same file is re-opened
many times by different processes.
The crashes appear to have been caused by the use
of the Lookup+Open RPC (which only happens for
mounts using the "oneopenown" option).
There appears to be a race between closure of the
open and the open acquired by the Lookup+Open RPC.
Since Lookup+Open RPCs are only an optimization
and can only be done for "oneopenown" at this time,
this patch reverts enabling of them.
It may be possible to fix the code so that
Lookup+Open works reliably, so the code is left
in place (although it will never be executed) for now.
Reported by: J David <j.david.lists@gmail.com>
MFC after: 2 weeks
Add a priv_check for PRIV_PROC_MEM_WRITE which will be blocked
by mac_veriexec if being enforced, unless the process has a maclabel
to grant priv.
Reviewed by: stevek
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D46692
The existing tmpfs implementation will return ENOTEMPTY for VOP_RMDIR,
or for the destination directory of VOP_RENAME, for any case in which
the directory is non-empty, even if the directory only contains
whiteouts.
Fix this by tracking total whiteout dirent allocation separately for
each directory, and avoid returning ENOTEMPTY if IGNOREWHITEOUT has
been specified by the caller and the total allocation of dirents is not
greater than the total whiteout allocation. This addresses "directory
not empty" failures seen on some recently-added unionfs stress2 tests
which use tmpfs as a base-layer filesystem.
A separate issue for independent consideration is that unionfs' default
behavior when deleting files or directories is to create whiteouts even
when it does not truly need to do so.
Differential Revision: https://reviews.freebsd.org/D45987
Reviewed by: kib (prior version), olce
Tested by: pho
This flag is meant to request that the VOP implementation ignore
whiteout entries when processing directory contents.
Employ this flag (initially) in UFS when determining whether a directory
is empty for the purpose of deleting it or renaming another directory
over it. The previous UFS behavior was to always ignore whiteouts and
to therefore always allow directories containing only whiteouts to be
deleted or overwritten. This makes sense when the directory in question
is being accessed through a unionfs view in which the whiteouts produce
a unionfs directory that is logically empty, but it makes less sense
when directly operating against the UFS directory in which case silently
discarding the whiteouts may produce unexpected behavior in a current or
future unionfs view. IGNOREWHITEOUT is therefore treated as opt-in and
only specified by unionfs_rmdir() when invoking VOP_RMDIR() against the
upper filesystem. IGNOREWHITEOUT is not currently used for unionfs
rename operations, as the current implementation of unionfs_rename()
simply forbids renaming over any existing upper filesystem directory in
the first place.
Differential Revision: https://reviews.freebsd.org/D45987
Reviewed by: olce
Tested by: pho