RFC8881 specifies that, when a Link operation occurs on an
NFSv4, that file delegations issued to other clients must
be recalled. Discovered during a recent discussion on nfsv4@ietf.org.
Although I have not observed a problem caused by not doing
the required delegation recall, it is definitely required
by the RFC, so this patch makes the server do the recall.
Tested during a recent NFSv4 IETF Bakeathon event.
Approved by: re (cperciva)
(cherry picked from commit 3f65000b6b1460a7a23cd83014bb41a68d1a8a19)
(cherry picked from commit 3c414a8c2f)
While here remove an old comment regarding preallocation; it appears to
refer to an optimization that is almost certainly irrelevant at this
point.
No functional change intended.
MFC after: 1 week
(cherry picked from commit 78c51db3c4927db2437ec616b33ba1faf73f08ee)
There is only one place in the unpatched sources where B_DIRECT is
set in the NFS client and this code is never executed. As such, this patch
removes this code that is never executed, since B_DIRECT should never
be set.
During a IETF testing event this week, I saw a crash in ncl_doio_directwrite(),
but this function is only called if B_DIRECT is set.
I cannot explain how ncl_doio_directwrite() got called, but once this patch
was applied to the sources, the crash did not recur. This is not surprising,
since this patch deleted the function.
(cherry picked from commit 03a39a17089adc1d0e28076670e664dcdebccf73)
When an initial attempt to close an NFSv4 lock returns NFSERR_DELAY,
the open structure is put on a list for delayed closing. When this
is done, the nfso_own field is set to NULL, so it cannot be used by
nfsrpc_doclose().
Without this patch, the NFSv4 client can crash when a NFSv4 server
replies NFSERR_DELAY to a Close operation. Fortunately, most extant
NFSv4 servers do not do this. This patch avoids the crash for any
that do return NFSERR_DELAY for Close.
Found during a IETF bakeathon testing event this week.
(cherry picked from commit 6251027c4252edb3b8f8fc359a40e610349e9af3)
This reverts commit f300335d9aebf2e99862bf783978bd44ede23550.
It turns out that the old code was correct and it was wireshark
that was broken and indicated that the RPC's XDR was bogus.
Found during IETF bakeathon testing this week.
(cherry picked from commit 54c3aa02e926268ba5551cd7d28fddf38b3589a2)
Commit 196787f79e67 erroneously assumed that the client code for
Open/Claim_deleg_cur_FH was broken, but it was not.
It was actually wireshark that was broken and indicated
that the correct XDR was bogus.
This reverts the part of 196787f79e67 that changed the arguments for
Open/Claim_deleg_cur_FH.
Found during the IETF bakeathon testing event this week.
(cherry picked from commit 8efba70d7914324890b1f8fe3079036eb2b5c3db)
There are a few spots in which unionfs_lookup() accesses unionfs vnode
private data without holding the corresponding vnode lock or interlock.
Reviewed by: kib, olce
Differential Revision: https://reviews.freebsd.org/D44601
(cherry picked from commit b18029bc59d2ed6b0eeeb233189cf713b34b467c)
If access from unreserved ports is disabled, then a remote host can
cause an NFS server to log a message by sending a packet. This is
useful for diagnosing problems but bad for resiliency in the case where
the server is being spammed with a large number of rejected requests.
Limit prints to once per second (racily).
Reviewed by: rmacklem, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44819
(cherry picked from commit b7e4666d7b69c22699a9299687018a892a5dad5b)
The MNT_IGNORE flag can be used to mark certain filesystem mounts so
that utilities such as df(1) and mount(8) can filter out those mounts by
default. This can be used, for instance, to reduce the noise from
running container workloads inside jails which often have at least three
and sometimes as many as ten mounts per container.
The flag is supplied by the nmount(2) system call and is recorded so
that it can be reported by statfs(2). Unfortunately several filesystems
override the default behaviour and mask out the flag, defeating its
purpose. This change preserves the MNT_IGNORE flag for those filesystems
so that it can be reported correctly.
MFC after: 1 week
(cherry picked from commit b5c4616582cebdcf4dee909a3c2f5b113c4ae59e)
FUSE emits spurious incoherency warnings in writethrough mode. The
warnings are triggered by setattr calls generated by vnode truncation
turning the cached va_size vattr stale, causing comparisons with the
fresh version provided by the server to fail. Only validate the vnode's
va_size vattr if the FN_SIZECHANGE flag is set.
This is a part of the research work at RCSLab, University of Waterloo.
Reviewed by: asomers
Pull Request: https://github.com/freebsd/freebsd-src/pull/1110
(cherry picked from commit 8758bf0aaec1d4b2ebcb429e8cabc691c2c95461)
The author reported that this patch was needed to avoid
crashes on a fairly busy RISC-V system. The author did not
provide details w.r.t. the crashes. Although I
have not seen any such crash, the patch looks reasonable
and I have not found any regressions when testing it.
Since "rdirplus" is not a default option, the patch is
only needed if you are doing NFS mounts with the "rdirplus"
mount option and seeing crashes related to the name cache.
(cherry picked from commit d00c64bb2347cc620d31a178c7755aa7e594f065)
This lets tarfs provide readahead/behind hints to the VFS, which helps
memory-mapped I/O performance, important when running faulting in
executables out of a tarfs mount as one might if tarfs is used to back
the root filesystem, for example. The improvement is particularly
noticeable when the backing tarball is zstd-compressed.
The implementation simply returns the extent of the virtual block
containing the target offset, clamped by the maximum I/O size. This is
perhaps simplistic; it effectively just chooses values that would
correspond to a single VOP_READ call in tarfs_read_file().
Reviewed by: des, kib
MFC after: 1 month
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D44626
(cherry picked from commit a0895e394d3fec374e61a207bdfa0245dae86f53)
There is no obvious reason to use a value smaller than that.
Reviewed by: des, kib
MFC after: 1 week
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D44627
(cherry picked from commit 91eca18551554b7aca80fcfd3c648f524b321252)
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D44599
(cherry picked from commit b1fd95c9e24791d44593e611406b41e57826a5b8)
tarfs: Ignore global extended headers.
Previously, we would error out if we encountered a global extended
header, because we don't know what it means. This doesn't really
matter though, and traditionally, tar implementations have either
ignored them or treated them as plain files, so just ignore them.
This allows tarfs to mount tar files created by `git archive`.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D44600
(cherry picked from commit 584e1c355ae3c994331005b7196cc87a714e5317)
tarfs: Fix 32-bit build.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: bapt
Differential Revision: https://reviews.freebsd.org/D44613
(cherry picked from commit 0238d3711d9b888f678fce4274eccc9175168395)
unionfs has a bunch of clunky special-case code to avoid creating
unionfs wrapper vnodes for AF_UNIX sockets. This was added in 2008
to address PR 118346, but in the intervening years the VOP_UNP_*
operations have been added to provide a clean interface to allow
sockets to work in the presence of stacked filesystems.
PR: 275871
Reviewed by: kib (prior version), olce
Tested by: Karlo Miličević <karlo98.m@gmail.com>
Differential Revision: https://reviews.freebsd.org/D44288
(cherry picked from commit eee6217b40df5877674df1d9aec7d5bd04202876)
NFSv4.2 supports a Copy operation, which avoids file data being
read to the client and then written back to the server, if both
input and output files are on the same NFSv4.2 mount for
copy_file_range(2).
Unfortunately, this Copy operation can take a long time under
certain circumstances. If this occurs concurrently with a RPC
that requires an exclusive lock on the nfsd such as ExchangeID
done for a new mount, the result can be an nfsd "stall" until
the Copy completes.
This patch adds a sysctl that can be set to limit the size of
a Copy operation or, if set to 0, disable Copy operations.
The use of this sysctl and other ways to avoid Copy operations
taking too long will be documented in the nfsd.4 man page by
a separate commit.
(cherry picked from commit 748f56c53f4286e0b140c1b779ff8ade1cf4fec9)
Since non-doomed unionfs vnodes always share their primary lock with
either the lower or upper vnode, any forwarded call to the base FS
which transiently drops that upper or lower vnode lock may result in
the unionfs vnode becoming completely unlocked during that transient
window. The unionfs vnode may then become doomed by a concurrent
forced unmount, which can lead to either or both of the following:
--Complete loss of the unionfs lock: in the process of being
doomed, the unionfs vnode switches back to the default vnode lock,
so even if the base FS VOP reacquires the upper/lower vnode lock,
that no longer translates into the unionfs vnode being relocked.
This will then violate that caller's locking assumptions as well
as various assertions that are enabled with DEBUG_VFS_LOCKS.
--Complete less of reference on the upper/lower vnode: the caller
normally holds a reference on the unionfs vnode, while the unionfs
vnode in turn holds references on the upper/lower vnodes. But in
the course of being doomed, the unionfs vnode will drop the latter
set of references, which can effectively lead to the base FS VOP
executing with no references at all on its vnode, violating the
assumption that vnodes can't be recycled during these calls and
(if lucky) violating various assertions in the base FS.
Fix this by adding two new functions, unionfs_forward_vop_start_pair()
and unionfs_forward_vop_finish_pair(), which are intended to bookend
any forwarded VOP which may transiently unlock the relevant vnode(s).
These functions are currently only applied to VOPs that modify file
state (and require vnode reference and lock state to be identical at
call entry and exit), as the common reason for transiently dropping
locks is to update filesystem metadata.
Reviewed by: olce
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D44076
(cherry picked from commit 6c8ded001540fd969ebc2eabd45a0066ebcc662b)
* Reject hard or soft links with an empty target path. Currently, a
debugging kernel will hit an assertion in tarfs_lookup_path() while
a non-debugging kernel will happily create a link to the mount root.
* Use a temporary variable to store the result of the link target path,
and copy it to tnp->other only once we have found it to be valid.
Otherwise we error out after creating a reference to the target but
before incrementing the target's reference count, which results in a
use-after-free situation in the cleanup code.
* Correctly return ENOENT from tarfs_lookup_path() if the requested
path was not found and create_dirs is false. Luckily, existing
callers did not rely solely on the return value.
MFC after: 3 days
PR: 277360
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: sjg
Differential Revision: https://reviews.freebsd.org/D44161
(cherry picked from commit 38b3683592d4c20a74f52a6e8e29368e6fa61858)
tarfs: Improve validation of numeric fields.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: sjg, allanjude
Differential Revision: https://reviews.freebsd.org/D44166
(cherry picked from commit 8427d94ce05682abb6c75e2a27c8c497962c0dc5)
tarfs: Avoid overflow in exthdr calculation.
MFC after: 3 days
PR: 277420
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44202
(cherry picked from commit c291b7914e1db9469cc820abcb1f5dde7a6f7f28)
tarfs: Remove unnecessary hack and obsolete comment.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D44203
(cherry picked from commit e212f0c0666e7d3a24dce03b8c88920d14b80e47)
tarfs: Fix checksum calculation.
The checksum code assumed that struct ustar_header filled an entire
block and calculcated the checksum based on the size of the structure.
The header is in fact only 500 bytes long while the checksum covers
the entire block (“logical record” in POSIX terms). Add padding and
an assertion, and clean up the checksum code.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44226
(cherry picked from commit 0118b0c8e58a438a931a5ce1bf8d7ae6208cc61b)
tarfs: Factor out common test code.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D44227
(cherry picked from commit 32b8aac6f9b77a1c4326083472d634e5de427547)
tarfs: Fix checksum on 32-bit platforms.
MFC after: 3 days
Fixes: b56872332e47786afc09515a4daaf1388da4d73c
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: bapt
Differential Revision: https://reviews.freebsd.org/D44261
(cherry picked from commit cbddb2f02c7687d1039abcffd931e94e481c11a5)
Store the upper/lower FS mount objects in unionfs per-mount data and
use these instead of the v_mount field of the upper/lower root
vnodes. As described in the referenced PR, it is unsafe to access this
field on the unionfs unmount path as ZFS rollback may have obliterated
the v_mount field of the upper or lower root vnode. Use these stored
objects to slightly simplify other code that needs access to the
upper/lower mount objects as well.
PR: 275870
Reported by: Karlo Miličević <karlo98.m@gmail.com>
Tested by: Karlo Miličević <karlo98.m@gmail.com>
Reviewed by: kib (prior version), olce
Differential Revision: https://reviews.freebsd.org/D43815
(cherry picked from commit cc3ec9f7597882d36ee487fd436d1b90bed0ebfd)
If the underlying upper FS supports shared locking for write ops,
as is the case with ZFS, VOP_FSYNC() may only be called with the vnode
lock held shared. In this case, temporarily upgrade the lock for
those unionfs maintenance operations which require exclusive locking.
While here, make unionfs inherit the upper FS' support for shared
write locking. Since the upper FS is the target of VOP_GETWRITEMOUNT()
this is what will dictate the locking behavior of any unionfs caller
that uses vn_start_write() + vn_lktype_write(), so unionfs must be
prepared for the caller to only hold a shared vnode lock in these
cases.
Found in local testing of unionfs atop ZFS with DEBUG_VFS_LOCKS.
Reviewed by: kib, olce
Differential Revision: https://reviews.freebsd.org/D43817
(cherry picked from commit 2656fc29be8b0fc1cd9e64ed52aa0a61fe87744c)
unionfs_mkshadowdir() may be invoked on a non-leaf pathname component
during lookup, in which case the NUL terminator of the pathname buffer
will be well beyond the end of the current component. cn_namelen in
this case will still (correctly) indicate the length of only the
current component, but ZFS in particular does not currently respect
cn_namelen, leading to the creation on inacessible files with slashes
in their names. Work around this behavior by temporarily NUL-
terminating the current pathname component for the call to VOP_MKDIR().
https://github.com/openzfs/zfs/issues/15705 has been filed to track
a proper upstream fix for the issue at hand.
PR: 275871
Reported by: Karlo Miličević <karlo98.m@gmail.com>
Tested by: Karlo Miličević <karlo98.m@gmail.com>
Reviewed by: kib, olce
Differential Revision: https://reviews.freebsd.org/D43818
(cherry picked from commit a2ddbe019d51b35f9da2cb5ddca8c69f0ee422da)
If a file system's on-disk format does not support st_birthtime, it
isn't clear what value it should return in stat(2). Neither our man
page nor the OpenGroup specifies. But our convention for UFS and
msdosfs is to return { .tv_sec = -1, .tv_nsec = 0 }. fusefs is
different. It returns { .tv_sec = -1, .tv_nsec = -1 }. It's done that
ever since the initial import in SVN r241519.
Most software apparently handles this just fine. It must, because we've
had no complaints. But the Rust standard library will panic when
reading such a timestamp during std::fs::metadata, even if the caller
doesn't care about that particular value. That's a separate bug, and
should be fixed.
Change our invalid value to match msdosfs and ufs, pacifying the Rust
standard library.
PR: 276602
Sponsored by: Axcient
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D43590
(cherry picked from commit 55b80e2ca52c4b27c4920d372a6e71ac9ab7da9e)
If a copy_file_range operation tries to read from a page that was
previously written via mmap, that page must be flushed first.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D43451
(cherry picked from commit 1c909c300b92601f7690610097ac98126caff835)
Commit 57ce37f9dcd0 modified the NFSv4.2 Copy operation so that
it will update atime on the infd file whenever possible.
This is done by adding a Setattr of TimeAccess for the
input file.
This patch disables this change for the case of an NFSv4.2
mount with the "noatime" mount option, which avoids the
additional Setattr of TimeAccess operation.
(cherry picked from commit cc760de2183f9c9a4099783d3ff4c770521a4cb6)
If the NFS server detects that the Kerberos credentials provided
by a NFSv4.1/4.2 mount using sec=krb5[ip] have expired, the NFS
server replies with a krpc layer error of RPC_AUTHERROR.
When this happened, the client erroneously left the NFSv4.1/4.2
session slot busy, so that it could not be used by other RPCs.
If this happened for all session slots, the mount point would
hang.
This patch fixes the problem by releasing the session slot
and resetting its sequence# upon receiving a RPC_AUTHERROR
reply.
This bug only affects NFSv4.1/4.2 mounts using sec=krb5[ip],
but has existed since NFSv4.1 client support was added to
FreeBSD.
So, why has the bug remained undetected for so long?
I cannot be sure, but I suspect that, often, the client detected
the Kerberos credential expiration before attempting the RPC.
For this case, the client would not do the RPC and, as such,
there would be no busy session slot. Also, no hang would
occur until all session slots are busied (64 for a FreeBSD
client/server), so many cases of the bug probably went undetected?
Also, use of sec=krb5[ip] mounts are not that common.
PR: 275905
(cherry picked from commit a558130881e9d574dc5f37827fe2284667d5aba8)