tmpfs_seek_data_locked should return the offset of the first page
either resident in memory or in swap, but may return an offset to a
nonresident page. Check for residence to fix that.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D46879
nfsrv_freeopen() was being called after the mutex
lock was released, making it possible for other
kernel threads to change the lists while nfsrv_freeopen()
took the nfsstateid out of the lists.
This patch moves the code around
"if (nfsrv_freeopen(stp, vp, 1 p) == 0) {"
into nfsrv_freeopen(), so that it can remove the nfsstateid
structure from all lists before unlocking the mutex.
This should avoid any race between CLOSE and other nfsd threads
updating the NFSv4 state.
The patch does not affect semantics when vfs.nfsd.enable_locallocks=0.
PR: 280978
Tested by: Matthew L. Dailey <matthew.l.dailey@dartmouth.edu>
MFC after: 1 week
This reverts commit 9792c7d3eb.
The email thread "panic: nfsv4root ref cnt cpuid=1"
on freebsd-fs@freebsd.org descibes
crashes that occurred for a NFSv4.1 client mount
using "oneopenown" where the same file is re-opened
many times by different processes.
The crashes appear to have been caused by the use
of the Lookup+Open RPC (which only happens for
mounts using the "oneopenown" option).
There appears to be a race between closure of the
open and the open acquired by the Lookup+Open RPC.
Since Lookup+Open RPCs are only an optimization
and can only be done for "oneopenown" at this time,
this patch reverts enabling of them.
It may be possible to fix the code so that
Lookup+Open works reliably, so the code is left
in place (although it will never be executed) for now.
Reported by: J David <j.david.lists@gmail.com>
MFC after: 2 weeks
Add a priv_check for PRIV_PROC_MEM_WRITE which will be blocked
by mac_veriexec if being enforced, unless the process has a maclabel
to grant priv.
Reviewed by: stevek
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D46692
The existing tmpfs implementation will return ENOTEMPTY for VOP_RMDIR,
or for the destination directory of VOP_RENAME, for any case in which
the directory is non-empty, even if the directory only contains
whiteouts.
Fix this by tracking total whiteout dirent allocation separately for
each directory, and avoid returning ENOTEMPTY if IGNOREWHITEOUT has
been specified by the caller and the total allocation of dirents is not
greater than the total whiteout allocation. This addresses "directory
not empty" failures seen on some recently-added unionfs stress2 tests
which use tmpfs as a base-layer filesystem.
A separate issue for independent consideration is that unionfs' default
behavior when deleting files or directories is to create whiteouts even
when it does not truly need to do so.
Differential Revision: https://reviews.freebsd.org/D45987
Reviewed by: kib (prior version), olce
Tested by: pho
This flag is meant to request that the VOP implementation ignore
whiteout entries when processing directory contents.
Employ this flag (initially) in UFS when determining whether a directory
is empty for the purpose of deleting it or renaming another directory
over it. The previous UFS behavior was to always ignore whiteouts and
to therefore always allow directories containing only whiteouts to be
deleted or overwritten. This makes sense when the directory in question
is being accessed through a unionfs view in which the whiteouts produce
a unionfs directory that is logically empty, but it makes less sense
when directly operating against the UFS directory in which case silently
discarding the whiteouts may produce unexpected behavior in a current or
future unionfs view. IGNOREWHITEOUT is therefore treated as opt-in and
only specified by unionfs_rmdir() when invoking VOP_RMDIR() against the
upper filesystem. IGNOREWHITEOUT is not currently used for unionfs
rename operations, as the current implementation of unionfs_rename()
simply forbids renaming over any existing upper filesystem directory in
the first place.
Differential Revision: https://reviews.freebsd.org/D45987
Reviewed by: olce
Tested by: pho
When processing an RPC response that did not include any Owner
attribute, nfsv4_loadattr would return na_uid and na_gid uninitialized.
The uninitialized values could then make their way into the NFS
attribute cache via nfscl_loadattrcache.
PR: 281279
Reported by: KMSAN
MFC after: 2 weeks
Reviewed by: rmacklem
Sponsored by: Axcient
If the FUSE_GETATTR issued to query a file's size during
fuse_vnop_deallocate failed for any reason, then fuse_vnop_deallocate
would attempt to destroy an uninitialized fuse_dispatcher struct, with a
crash the likely result. This bug only affects FUSE file systems that
implement FUSE_FALLOCATE, and is unlikely to be seen on those that don't
disable attribute caching.
Reported by: Coverity Scan
CID: 1505308
MFC after: 2 weeks
Commit d8a5961 made a change to nfsv4_sattr() that broke
parsing of the setable attributes for a NFSv4 SETATTR.
(It broke out of the code by setting "error" and returning
right away, instead of noting the error in nd_repstat and
allowing parsing of the attributes to continue.)
By returning prematurely, it was possible for SETATTR to return
the error, but with a bogus set of attribute bits set, since
"retbits" had not yet been set to all zeros.
(I am not sure if any client could be affected by this bug.
The patch was done for a failure case detected by a pynfs test
suite and not an actual client.)
While here, the patch also fixes a
few cases where the value of attributes gets set for attributes
after an error has been set in nd_repstat. This would not really
break the protocol, since a SETATTR is allowed to set some attributes
and still return an failure, but should not really be done.
MFC after: 2 weeks
and keep reporting EINVAL for CREATE or RENAME lookups. The reasoning is
that non-corrupted filesystem cannot have invalid dirents anyway, so
returning ENOENT is more in-line with the natural behavior.
PR: 281033
MFC after: 1 week
RFC8275 defines a new attribute as an extension to NFSv4.2
called MODE_UMASK. This patch adds support for this attribute
to the NFSv4.2 client and server.
Since FreeBSD applies the umask above the VFS/VOP layer,
this attribute does not actually have any effect on the
handling of ACL inheritance, which is what it is designed for.
However, future changes to NFSv4.2 require support of it,
so this patch does that, resulting in behaviour identcal to
the mode attribute already supported.
MFC after: 2 months
RFC8275 defines a new attribute as an extension to NFSv4.2
called MODE_UMASK. This patch adds the attribute number
to nfsproto.h.
Future patches will add optional support for the attribute.
This patch does not cause any semantics change.
MFC after: 2 weeks
tmpfs_mem_percent is an int not a long, so on a 64-bit system this
writes 4 bytes past the end of the variable. The read above is correct,
so this was likely a copy paste error from sysctl_mem_reserved.
Found by: CHERI
Fixes: 636592343c ("tmpfs: increase memory reserve to a percent of available memory + swap")
The maximum cluster number was calculated based on the number of data
cluters that fit in the givem partition size and the size of the FAT
area. This limit did not take into account that the highest 10 cluster
numbers are reserved and must not be used for files.
PR: 280347
MFC after: 3 days
Reported by: pho@FreeBSD.org
- Remove most checks of the P_INMEM flag.
- Some uses remain since a few userspace tools, e.g., ps(1) and top(1)
expect the flag to be set. These can be cleaned up but the code has
most likely been copy-pasted elsewhere and while linger for a long
time.
Tested by: pho
Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D46117
- Modify PHOLD() to no longer fault in the process.
- Remove _PHOLD_LITE(), which is now the same as _PHOLD(), fix up
consumers.
- Remove faultin() and its callees.
Tested by: pho
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D46114
Instead of casting a vop_F_args object to vop_generic_args, use a
vop_F_args.a_gen field when calling null_bypass(). This way we don't
hardcode the vop_generic_args data type in the callers of null_bypass().
Before this change, there were 3 null_bypass() calls using
a vop_F_args.a_gen field and 5 null_bypass() calls using a cast to
vop_generic_args. This change makes all null_bypass() calls consistent
and easier to maintain.
Pointed out by: jrtc27
Reviewed by: kib, oshogbo
Accepted by: oshogbo (mentor)
Differential Revision: https://reviews.freebsd.org/D37359
The NFS RFCs are pretty loose with respect to what characters
can be in a filename returned by a Readdir. However, FreeBSD,
as a POSIX system will not handle imbedded '/' or nul characters
in file names. Also, for NFSv4, the file names "." and ".."
are handcrafted on the client and should not be returned by a
NFSv4 server.
This patch scans for the above in filenames returned by Readdir and
ignores any entry returned by Readdir which has them in it.
Because an imbedded nul would be a string terminator, it was
not possible to code this check efficiently using string(3)
functions.
Reported by: Apple Security Engineering and Architecture (SEAR)
MFC after: 1 week
Fix a stale variable name that snuck into a tracepoint from an earlier
version of the change.
Fixes: eb60ff1e "unionfs: rework locking scheme to only lock a single
vnode"
Reported by: jenkins
This code is using the vnode after it has been released and causing a
panic when a p9fs shared volume is unmounted. In fact, it seems like it's
just duplicated code left behind from a bad merge.
PR: 279887
Reported by: Michael Dexter
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1323
Split the portion of unionfs_get_node_status() that searches for an
existing status object into a new helper function,
unionfs_find_node_status(), and use that in unionfs_close().
Additionally, modify unionfs_close() to accept a NULL status object
if unionfs_find_node_status() does not find a matching status
object. This can happen due to the unconditional VOP_CLOSE()
operation issued by vgonel().
Differential Revision: https://reviews.freebsd.org/D45398
Reviewed by: olce
Tested by: pho
Instead of locking both the lower and upper vnodes, which is both
complex and deadlock-prone, only lock the upper vnode, or the lower
vnode if no upper vnode is present.
In most cases this is all that is needed; for the cases in which
both vnodes do need to be locked, this change also employs deadlock-
avoiding techniques such as LK_NOWAIT and vn_lock_pair().
There are still some corner cases in which the current implementation
ends up taking multiple vnode locks across different filesystems
without taking special steps to avoid deadlock; those cases have
been noted in the comments.
Differential Revision: https://reviews.freebsd.org/D45398
Reviewed by: olce
Tested by: pho
gcc -Wmemset-elt-size diagnosed this. The code was only initializing
1/4 of the array. However, it was actually harmless, as the only caller
had done an M_ZERO allocation anyway.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D45701
Mostly copied from smbfs. This driver in its current state has the exact
same issue that prevents the generic putpages implementation from
working.
Sponsored by: https://www.patreon.com/valpackett
Reviewed by: dfr
Differential Revision: https://reviews.freebsd.org/D45639
MFC after: 3 months
Background:
If a user does pathconf(_, _PC_MIN_HOLE_SIZE) on a fusefs file system,
the kernel must actually issue a FUSE_LSEEK operation in order to
determine whether the server supports it. We cache that result, so we
only have to send FUSE_LSEEK the first time that _PC_MIN_HOLE_SIZE is
requested on any given mountpoint.
Problem 1:
Unlike fpathconf, pathconf operates on files that may not be open. But
FUSE_LSEEK requires the file to be open. As described in PR 278135,
FUSE_LSEEK cannot be sent for unopened files, causing _PC_MIN_HOLE_size
to wrongly report EINVAL. We never noticed that before because the
fusefs test suite only uses fpathconf, not pathconf. Fix this bug by
opening the file if necessary.
Problem 2:
On a completely sparse file, with no data blocks at all, FUSE_LSEEK with
SEEK_DATA would fail to ENXIO. That's correct behavior, but
fuse_vnop_pathconf wrongly interpreted that as "FUSE_LSEEK not
supported". Fix the interpretation.
PR: 278135
MFC after: 1 week
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D44618
The lib9p implementation takes a strict interpretation of the Twalk RPC
call and returns an error for attempts to lookup ".". The workaround is
to fake the lookup locally.
Reviewed by: Val Packett <val@packett.cool>
MFC after: 3 months
Commit dfaeeacc2c modified clientID handling so that it could be done
with only a mutex lock held when vfs.nfsd.enable_locallocks is 0.
This makes it unsafe to change the setting of vfs.nfsd.enable_locallocks
when nfsd threads are active.
This patch forces all nfsd threads to be blocked when the value
of vfs.nfsd.enable_locallocks is changed, so that it is done safely.
MFC after: 1 month
On Feb. 28, a problem was reported on freebsd-stable@ where a
nfsd thread processing an ExchangeID operation was blocked for
a long time by another nfsd thread performing a copy_file_range.
This occurred because the copy_file_range was taking a long time,
but also because handling a clientID requires that all other nfsd
threads be blocked via an exclusive lock, as required by ExchangeID.
This patch allows clientID handling to be done with only a mutex
held (instead of an exclusive lock that blocks all other nfsd threads)
when vfs.nfsd.enable_locallocks is 0. For the case of
vfs.nfsd.enable_locallocks set to 1, the exclusive lock that
blocks all nfsd threads is still required.
This patch does make changing the value of vfs.nfsd.enable_locallocks
somewhat racy. A future commit will ensure any change is done when
all nfsd threads are blocked to avoid this racyness.
MFC after: 1 month
On Feb. 28, a problem was reported on freebsd-stable@ where a
nfsd thread processing an ExchangeID operation was blocked for
a long time by another nfsd thread performing a copy_file_range.
This occurred because the copy_file_range was taking a long time,
but also because handling a clientID requires that all other nfsd
threads be blocked via an exclusive lock, as required by ExchangeID.
This patch adds two arguments to nfsv4_cleanclient() so that it
can optionally be called with a mutex held. For this patch, the
first of these arguments is "false" and, as such, there is no
change in semantics. However, this change will allow a future
commit to modify handling of the clientID so that it can be done
with a mutex held while other nfsd threads continue to process
NFS RPCs.
MFC after: 1 month
This is derived from swills@ fork of the Juniper virtfs with many
changes by me including bug fixes, style improvements, clearer layering
and more consistent logging. The filesystem is renamed to p9fs to better
reflect its function and to prevent possible future confusion with
virtio-fs.
Several updates and fixes from Juniper have been integrated into this
version by Val Packett and these contributions along with the original
Juniper authors are credited below.
To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The
bhyve virtio-9p device allows access from the guest to files on the host
by mapping a 'sharename' to a host path. It is possible to use p9fs as a
root filesystem by adding this to /boot/loader.conf:
vfs.root.mountfrom="p9fs:sharename"
for non-root filesystems add something like this to /etc/fstab:
sharename /mnt p9fs rw 0 0
In both examples, substitute the share name used on the bhyve command
line.
The 9P filesystem protocol relies on stateful file opens which map
protocol-level FIDs to host file descriptors. The FreeBSD vnode
interface doesn't really support this and we use heuristics to guess the
right FID to use for file operations. This can be confused by privilege
lowering and does not guarantee that the FID created for a given file
open is always used for file operations, even if the calling process is
using the file descriptor from the original open call. Improving this
would involve changes to the vnode interface which is out-of-scope for
this import.
Differential Revision: https://reviews.freebsd.org/D41844
Reviewed by: kib, emaste, dch
MFC after: 3 months
Co-authored-by: Val Packett <val@packett.cool>
Co-authored-by: Ka Ho Ng <kahon@juniper.net>
Co-authored-by: joyu <joyul@juniper.net>
Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
For NFSv4.1/4.2, an atomic upgrade of a delegation from a
read delegation to a write delegation is allowed and can
result in significantly improved performance.
This patch adds this upgrade to the NFSv4.1/4.2 client and
enables use of read delegations.
For a test case of building a FreeBSD kernel (sources and
output objects) over a NFSv4.2 mount, these changes reduced
the elapsed time by 30% and included a reduction of 80% for
RPC counts when delegations were enabled. As such, with this
patch there are at least certain cases where enabling
delegations seems to be worth the increased complexity they
bring.
This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.
MFC after: 1 month
Since delegations are only issued for regular files, check
v_type to see if the query is for a regular file. This is
a simple optimization for the non-VREG case.
While here, fix a couple of global variable declarations.
This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.
MFC after: 1 month
NFSv4.1/4.2 defined new OPEN_WANT_xxx flags that a client
can use to hint to the server that delegations are or are
not wanted. This patch adds use of those delegations to
the client.
This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.
MFC after: 1 month
For NFSv4.1/4.2, an atomic upgrade of a delegation from a
read delegation to a write delegation is allowed and can
result in signoficantly improved performance.
This patch adds support for this atomic upgrade, plus fixes
a couple of other delegation related bugs. Since there were
three cases where delegations were being issued, the patch
factors this out into a separate function called
nfsrv_issuedelegations().
This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.
MFC after: 1 month
The PR reported hangs that were avoided when this commit was
reverted. Since it was only a cleanup, revert it.
The LORs in the PR need further investigation, since I think
readahead only hides the problem.
PR: 279138
This reverts commit fbe965591f.
Whenever file is created, the vnode_create_vobject() function will
try to determine its size by calling vn_getsize_locked() as size 0
is ambigious: it means either the file size is 0 or the file size
is unknown.
Introduce special value for the size argument: VNODE_NO_SIZE.
Only when it is given, the vnode_create_vobject() will try to obtain
file's size on its own.
Introduce dedicated vnode_disk_create_vobject() for use by
g_vfs_open(), so we don't have to call vn_isdisk() in the common case
(for regular files).
Handle the case of mediasize==0 in g_vfs_open().
Reviewed by: alc, kib, markj, olce
Approved by: oshogbo (mentor), allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D45244
Which allows tmpfs_pager_writecount_recalc() to reliably detect
reclaimed vnode and make its accesses to object->un_pager.swp.private
(== vp) safe against reclaim. Note that vnode instantiation already
assigns v_object under the object lock.
Reviewed by: markj
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D45119