The FUSE protocol allows the client (kernel) to cache a file's size, if the
server (userspace daemon) allows it. A well-behaved daemon obviously should
not change a file's size while a client has it cached. But a buggy daemon
might. If the kernel ever detects that that has happened, then it should
invalidate the entire cache for that file. Previously, we would not only
cache stale data, but in the case of a file extension while we had the size
cached, we accidentally extended the cache with zeros.
PR: 244178
Reported by: Ben RUBSON <ben.rubson@gmx.com>
Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24012
counter(9) is more performant than using atomic instructions to update
sysctls that just report statistics to userland.
Sponsored by: The FreeBSD Foundation
As of protocol 7.23, fuse file systems can specify their cache behavior on a
per-mountpoint basis. If they set FUSE_WRITEBACK_CACHE in
fuse_init_out.flags, then they'll get the writeback cache. If not, then
they'll get the writethrough cache. If they set FOPEN_DIRECT_IO in every
FUSE_OPEN response, then they'll get no cache at all.
The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use
protocol 7.23 or later. However, it's retained for older servers,
especially for those running in jails that lack access to the new protocol.
This commit also fixes two other minor test bugs:
* WriteCluster:SetUp was using an uninitialized variable.
* Read.direct_io_pread wasn't verifying that the cache was actually
bypassed.
Sponsored by: The FreeBSD Foundation
If a server supports a timestamp granularity other than 1ns, it can tell the
client this as of protocol 7.23. The client will use that granularity when
updating its cached timestamps during write. This way the timestamps won't
appear to change following flush.
Sponsored by: The FreeBSD Foundation
Writing should implicitly update a file's mtime and ctime. For fuse, the
server is supposed to do that. But the client needs to do it too, because
the FUSE_WRITE response does not include time attributes, and it's not
desirable to issue a GETATTR after every WRITE. When using the writeback
cache, there's another hitch: the kernel should ignore the mtime and ctime
fields in any GETATTR response for files with a dirty write cache.
Sponsored by: The FreeBSD Foundation
Protocol version 7.10 has only one new feature, and I'm choosing not to
implement it, so this commit is basically a noop. The sole new feature is
the FOPEN_NONSEEKABLE flag, which a fuse file system can return to indicate
that a certain file handle cannot be seeked. However, I'm unaware of any
file system in ports that uses this flag.
Sponsored by: The FreeBSD Foundation
vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever
since that function was first added in r34611. Remove it. Also, remove some
"struct ucred" arguments from fuse and nfs functions that were only used by
vtruncbuf.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20377
If a daemon sets the FUSE_ASYNC_READ flag during initialization, then the
client is allowed to issue multiple concurrent reads for the same file
handle. Otherwise concurrent reads are not allowed. This commit implements
it. Previously we unconditionally disallowed concurrent reads.
Sponsored by: The FreeBSD Foundation
This commit adds the VOPs needed by userspace NFS servers (tested with
net/unfs3). More work is needed to make the in-kernel nfsd work, because of
its stateless nature. It doesn't open files prior to doing I/O. Also, the
NFS-related VOPs currently ignore the entry cache.
Sponsored by: The FreeBSD Foundation
Consolidate all calls to fuse_vnode_setsize as a result of a file attribute
change to one location in fuse_internal_setattr. There are still a few
calls elsewhere that happen as a result of a write.
Sponsored by: The FreeBSD Foundation
fuse_vnode_data.filesize was mostly redundant with
fuse_vnode_data.cached_attrs.st_size, but didn't have exactly the same
meaning. It was very confusing. This commit eliminates the former. It
also eliminates fuse_vnode_refreshsize, which ignored the cache timeout
value.
Sponsored by: The FreeBSD Foundation
fuse_vnode_refreshsize was using 0 as a flag value for filesize meaning
"uninitialized" (thanks to the malloc(...M_ZERO) in fuse_vnode_alloc. But
this led to unnecessary getattr operations when the filesize legitimately
happened to be zero. Fix by adding a distinct flag value.
Sponsored by: The FreeBSD Foundation
This sysctl was added > 6.5 years ago and I don't know why. The description
seems at odds with the code. While it's supposed to "discard clean cached
data" during VOP_INACTIVE, it looks like it would discard any cached data,
clean or otherwise.
Sponsored by: The FreeBSD Foundation
This sysctl was added > 6.5 years ago for no clear reason. Perhaps it was
intended to gate an unstable feature? But now there's no reason to globally
disable mmap. I'm not deleting the -ono_mmap mount option just yet, because
it might be useful as a workaround for bug 237588.
Sponsored by: The FreeBSD Foundation
This was added > 6.5 years ago with no evident reason why. It probably had
something to do with the incomplete cached attribute implementation. But
cache attributes work now. I see no reason to retain this sysctl.
Sponsored by: The FreeBSD Foundation
This sysctl was added > 6.5 years ago for no clear purpose. I'm guessing
that it may have had something to do with the incomplete attribute cache.
But the attribute cache works now. Since there's no clear motivation for
this sysctl, it's best to remove it.
Sponsored by: The FreeBSD Foundation
This looks like it may have been a workaround for a specific buggy FUSE
filesystem. However, there's no information about what that bug may have
been, and the workaround is > 6.5 years old, so I consider the sysctl to be
unmaintainable.
Sponsored by: The FreeBSD Foundation
These panics all lie in the error path. The only one I've hit is caused by
a buggy FUSE server unexpectedly changing the type of a vnode.
Sponsored by: The FreeBSD Foundation
Don't panic if the server changes the file type of a file without us first
deleting it. That could indicate a buggy server, but it could also be the
result of one of several race conditions. Return EAGAIN as we do elsewhere.
Sponsored by: The FreeBSD Foundation
The FUSE protocol includes a way for a server to tell the client that a
negative lookup response is cacheable for a certain amount of time.
PR: 236226
Sponsored by: The FreeBSD Foundation
Follow-up to r346046. These two commits implement fuse cache timeouts for
both entries and attributes. They also remove the vfs.fusefs.lookup_cache
enable sysctl, which is no longer needed now that cache timeouts are
honored.
PR: 235773
Sponsored by: The FreeBSD Foundation
The FUSE protocol allows the server to specify the timeout period for the
client's attribute and entry caches. This commit implements the timeout
period for the attribute cache. The entry cache's timeout period is
currently disabled because it panics, and is guarded by the
vfs.fusefs.lookup_cache_expire sysctl.
PR: 235773
Reported by: cem
Sponsored by: The FreeBSD Foundation
FUSE_LOOKUP, FUSE_GETATTR, FUSE_SETATTR, FUSE_MKDIR, FUSE_LINK,
FUSE_SYMLINK, FUSE_MKNOD, and FUSE_CREATE all return file attributes with a
cache validity period. fusefs will now cache the attributes, if the server
returns a non-zero cache validity period.
This change does _not_ implement finite attr cache timeouts. That will
follow as part of PR 235773.
PR: 235775
Reported by: cem
Sponsored by: The FreeBSD Foundation
If a fuse file system returne FOPEN_KEEP_CACHE in the open or create
response, then the client is supposed to _not_ clear its caches for that
file. I don't know why clearing the caches would be the default given that
there's a separate flag to bypass the cache altogether, but that's the way
it is. fusefs(5) will now honor this flag.
Our behavior is slightly different than Linux's because we reuse file
handles. That means that open(2) wont't clear the cache if there's a
reusable file handle, even if the file server wouldn't have sent
FOPEN_KEEP_CACHE had we opened a new file handle like Linux does.
PR: 236560
Sponsored by: The FreeBSD Foundation
The FUSE protocol says that FUSE_FLUSH should be send every time a file
descriptor is closed. That's not quite possible in FreeBSD because multiple
file descriptors can share a single struct file, and closef doesn't call
fo_close until the last close. However, we can still send FUSE_FLUSH on
every VOP_CLOSE, which is probably good enough.
There are two purposes for FUSE_FLUSH. One is to allow file systems to
return EIO if they have an error when writing data that's cached
server-side. The other is to release POSIX file locks (which fusefs(5) does
not yet support).
PR: 236405, 236327
Sponsored by: The FreeBSD Foundation
During truncate, fusefs was discarding entire cached blocks, but it wasn't
zeroing out the unused portion of a final partial block. This resulted in
reads returning stale data.
PR: 233783
Reported by: fsx
Sponsored by: The FreeBSD Foundation
By default, FUSE performs authorization in the server. That means that it's
insecure for the client to reuse FUSE file handles between different users,
groups, or processes. Linux handles this problem by creating a different
FUSE file handle for every file descriptor. FreeBSD can't, due to
differences in our VFS design.
This commit adds credential information to each fuse_filehandle. During
open(2), fusefs will now only reuse a file handle if it matches the exact
same access mode, pid, uid, and gid of the calling process.
PR: 236844
Sponsored by: The FreeBSD Foundation
The FUSE protocol allows each open file descriptor to have a unique file
handle. On FreeBSD, these file handles must all be stored in the vnode.
The old method (also used by OSX and OpenBSD) is to store them all in a
small array. But that limits the total number that can be stored. This
commit replaces the array with a linked list (a technique also used by
Illumos). There is not yet any change in functionality, but this is the
first step to fixing several bugs.
PR: 236329, 236340, 236381, 236560, 236844
Discussed with: cem
Sponsored by: The FreeBSD Foundation
fuse(4) was heavily instrumented with debug printf statements that could
only be enabled with compile-time flags. They fell into three basic groups:
1. Totally redundant with dtrace FBT probes. These I deleted.
2. Print textual information, usually error messages. These I converted to
SDT probes of the form fuse:fuse:FILE:trace. They work just like the old
printf statements except they can be enabled at runtime with dtrace. They
can be filtered by FILE and/or by priority.
3. More complicated probes that print detailed information. These I
converted into ad-hoc SDT probes.
Also, de-inline fuse_internal_cache_attrs. It's big enough to be a regular
function, and this way it gets a dtrace FBT probe.
This commit is a merge of r345304, r344914, r344703, and r344664 from
projects/fuse2.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19667
This makes it more consistent with other filesystems, which all end in "fs",
and more consistent with its mount helper, which is already named
"mount_fusefs".
Reviewed by: cem, rgrimes
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19649
fuse(4) was heavily instrumented with debug printf statements that could
only be enabled with compile-time flags. They fell into three basic groups:
1) Totally redundant with dtrace FBT probes. These I deleted.
2) Print textual information, usually error messages. These I converted to
SDT probes of the form fuse:fuse:FILE:trace. They work just like the old
printf statements except they can be enabled at runtime with dtrace.
They can be filtered by FILE and/or by priority.
3) More complicated probes that print detailed information. These I
converted into ad-hoc SDT probes.
Sponsored by: The FreeBSD Foundation
Take a pass through fixing some of the most egregious whitespace issues in
fs/fuse. Also fix some style(9) warts while here. Not 100% cleaned up, but
somewhat less painful to look at and edit.
No functional change.
At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol
specifies only clean data to be cached.
Prior to this change, we implement and default to writeback caching. This
is ok enough for local only filesystems without hardlinks, but violates the
general design contract with FUSE and breaks distributed filesystems or
concurrent access to hardlinks of the same inode.
In this change, add cache mode as an extension of cache enable/disable. The
new modes are UC (was: cache disabled), WT (default), and WB (was: cache
enabled).
For now, WT caching is implemented as write-around, which meets the goal of
only caching clean data. WT can be better than WA for workloads that
frequently read data that was recently written, but WA is trivial to
implement. Note that this has no effect on O_WRONLY-opened files, which
were already coerced to write-around.
Refs:
* https://sourceforge.net/p/fuse/mailman/message/8902254/
* https://github.com/vgough/encfs/issues/315
PR: 230258 (inspired by)
Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update
the buf cache bounds as a result of either a read from the underlying FUSE
filesystem, or as part of a write-through type operation (like truncate =>
VOP_SETATTR). In these cases, do not set the FN_SIZECHANGE flag, which
indicates that an inode's data is dirty (in particular, that the local buf
cache and fvdat->filesize have dirty extended data).
PR: 230258 (related)
The FUSE protocol demands that kernel implementations cache user filesystem
path components (lookup/cnp data) for a maximum period of time in the range
of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or
10 seconds; or "a long time" to represent indefinite caching.
Historically, FreeBSD FUSE has ignored this client directive entirely. This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.
For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.
Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup
if the user filesystem did not set a zero second TTL.
PR: 230258 (inspired by; does not fix)
The FUSE protocol demands that kernel implementations cache user filesystem
file attributes (vattr data) for a maximum period of time in the range of
[0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10
seconds; or "a long time" to represent indefinite caching.
Historically, FreeBSD FUSE has ignored this client directive entirely. This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.
For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.
In the future, as an optimization, we should implement notify_inval_entry,
etc, which provide userspace filesystems a way of evicting the kernel cache.
One potentially bogus access to invalid cached attribute data was left in
fuse_io_strategy. It is restricted behind the undocumented and non-default
"vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are
deadcode and can be eliminated?
Some minor APIs changed to facilitate this:
1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data").
2. cache_attrs() respects the provided TTL and only caches in the FUSE
inode if TTL > 0. It also grows an "out" argument, which, if non-NULL,
stores the translated fuse_attr (even if not suitable for caching).
3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help
avoid programming mistakes.
4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE
link op was weakened (only performed when we have a valid attr cache). The
check is racy in a multi-writer network filesystem anyway -- classic TOCTOU.
We have to trust any userspace filesystem that rejects local caching to
account for it correctly.
PR: 230258 (inspired by; does not fix)