Commit graph

2324 commits

Author SHA1 Message Date
Don Morris
640c2ff8ae ufs: Avoid M_WAITOK allocations when building a dirhash
At this point the directory's vnode lock is held, so blocking while
waiting for free pages makes the system more susceptible to deadlock in
low memory conditions.  This is particularly problematic on NUMA systems
as UMA currently implements a strict first-touch policy.

ufsdirhash_build() already uses M_NOWAIT for other allocations and
already handled failures for the block array allocation, so just convert
to M_NOWAIT.

PR:		253992
Reviewed by:	markj, mckusick, vangyzen

(cherry picked from commit f17a590085)
2021-05-27 09:05:50 -04:00
Konstantin Belousov
22f23299b7 b_vflags update requries bufobj lock
(cherry picked from commit e3d6759585)
2021-04-23 14:14:10 +03:00
Kirk McKusick
44f01dbeda Ensure that the mount command shows "with quotas" when quotas are enabled.
(cherry picked from commit 14d0cd7225)
2021-04-18 10:08:49 -07:00
Konstantin Belousov
def8b2b427 FFS extattr: fix handling of the tail
(cherry picked from commit 8742817ba6)
2021-03-04 21:07:25 +02:00
Kirk McKusick
66308a13dd Fix bug 253158 - Panic: snapacct_ufs2: bad block - mksnap_ffs(8) crash
PR:           253158

(cherry picked from commit 8563de2f27)
(cherry picked from commit c31480a1f6)
2021-02-25 14:56:20 +02:00
Konstantin Belousov
ffa424772e Call softdep_prealloc() before taking ffs_lock_ea(), if unlock is committing
(cherry picked from commit 6f30ac9995)
2021-02-25 14:55:18 +02:00
Konstantin Belousov
e2827f8a13 ffs_close_ea: do not relock vnode under lock_ea
(cherry picked from commit 5e198e7646)
2021-02-25 14:55:18 +02:00
Konstantin Belousov
120c4f6405 ffs_vnops.c: style
(cherry picked from commit c6d68ca842)
2021-02-25 14:55:18 +02:00
Konstantin Belousov
94412c2d00 ffs: do not call softdep_prealloc() from UFS_BALLOC()
(cherry picked from commit 4983146279)
2021-02-25 14:55:18 +02:00
Konstantin Belousov
fd61ccfeb4 ffs_reallocblks: change the guard for softdep_prealloc() call to DOINGSUJ()
(cherry picked from commit cc9958bf22)
2021-02-25 14:55:18 +02:00
Konstantin Belousov
c6e46d0c6b fifo: minor comment and assert improvements.
(cherry picked from commit adf28ab456)
2021-02-24 09:48:37 +02:00
Konstantin Belousov
3f0157cb65 ffs_unlock: assert that IN_ENDOFF is not leaked past locked scope
(cherry picked from commit 26af9f72f7)
2021-02-24 09:48:20 +02:00
Konstantin Belousov
ea8aa5f3ed ffs softdep: Force processing of VI_OWEINACT vnodes when there is inode shortage
(cherry picked from commit 28703d2713)
2021-02-24 09:48:00 +02:00
Konstantin Belousov
b1ed1a5151 softdep_request_cleanup: wait for softdep_request_clean_flush() to pass
(cherry picked from commit 2011b44fa3)
2021-02-24 09:47:41 +02:00
Konstantin Belousov
6dde909bb5 ufs_inactive(): stop hiding ERELOOKUP from ffs_truncate(), return it.
(cherry picked from commit 013168db8c)
2021-02-24 09:47:18 +02:00
Konstantin Belousov
bbf612a1af Stop ignoring ERELOOKUP from VOP_INACTIVE()
(cherry picked from commit b59a8e63d6)
2021-02-24 09:46:59 +02:00
Konstantin Belousov
d72a156fb6 ufs vnops: brace softdep_prelink() with DOINGSUJ instead of DOINGSOFTDEP
(cherry picked from commit 6aed2435c8)
2021-02-24 09:46:41 +02:00
Konstantin Belousov
df1314860a ffs softdep: remove will_direnter argument of softdep_prelink()
(cherry picked from commit ede40b0675)
2021-02-24 09:46:18 +02:00
Konstantin Belousov
220cd6b045 ufs_direnter: directory truncation does not need special case for rename
(cherry picked from commit 06f2918ab8)
2021-02-24 09:45:57 +02:00
Konstantin Belousov
e1e108f351 ufs_rename: use VOP_VPUT_PAIR and rely on directory sync/truncation there
(cherry picked from commit 038fe6e089)
2021-02-24 09:45:39 +02:00
Konstantin Belousov
c918922cb9 ufs_direnter: move directory truncation to ffs_vput_pair().
(cherry picked from commit 74a3652f83)
2021-02-24 09:45:21 +02:00
Konstantin Belousov
b7d28c4e90 ffs_vput_pair(): try harder to recover from the vnode reclaim
(cherry picked from commit 30bfb2fa0f)
2021-02-24 09:45:01 +02:00
Konstantin Belousov
1de6eb52fd FFS: implement special VOP_VPUT_PAIR().
(cherry picked from commit f2c9d038bd)
2021-02-24 09:44:41 +02:00
Konstantin Belousov
6c465e4719 ffs_snapshot: use VOP_VPUT_PAIR after VOP_CREATE.
(cherry picked from commit be44e98637)
2021-02-24 09:44:07 +02:00
Konstantin Belousov
ed3b4bbe35 ufs_direnter/SU: unconditionally UFS_UPDATE inode when extending directory
(cherry picked from commit 08c2dc2841)
2021-02-24 09:42:20 +02:00
Konstantin Belousov
861d47845f ffs_syncvnode: only clear IN_NEEDSYNC after successfull sync
(cherry picked from commit 1de1e2bfbf)
2021-02-24 09:41:58 +02:00
Konstantin Belousov
ed06398293 Merge ufs_fhtovp() into ffs_inotovp().
(cherry picked from commit 89fd61d955)
2021-02-24 09:41:37 +02:00
Konstantin Belousov
e3e958f3a4 ffs_inotovp(): interface to convert (ino, gen) into alive vnode
(cherry picked from commit 5952c86c78)
2021-02-24 09:41:12 +02:00
Konstantin Belousov
9255b36faf ffs: Add FFSV_REPLACE_DOOMED flag to ffs_vgetf()
(cherry picked from commit f16c26b1c0)
2021-02-24 09:40:52 +02:00
Konstantin Belousov
0672f9d808 ffs: call ufsdirhash_dirtrunc() right after setting directory size
(cherry picked from commit e94f2f1be3)
2021-02-24 09:40:29 +02:00
Konstantin Belousov
750252612d buf SU hooks: track buf_start() calls with B_IOSTARTED flag
(cherry picked from commit bf0db19339)
2021-02-24 09:40:04 +02:00
Konstantin Belousov
4b2a20dfde ffs_vnops.c: Move opt_*.h includes to the top.
(cherry picked from commit 0281f88e5d)
2021-02-19 14:45:52 +02:00
Mateusz Guzik
afea6cb020 ufs: denote lack of support for lockless symlink lookup
It is unclear without investigating if it can be provided without using
extra memory, so for the time being just don't.

(cherry picked from commit c892d60a1d)
2021-02-01 12:38:23 +00:00
Kirk McKusick
1aa1ede1fd MFC: a63eae6
Revert 2d4422e799, Eliminate lock order reversal in UFS ffs_unmount().

After discussion with Chuck Silvers (chs@) we have decided that
there is a better way to resolve this lock order reversal which
will be committed separately.

Sponsored by: Netflix

(cherry picked from commit a63eae65ff)
2021-01-30 00:15:41 -08:00
Kirk McKusick
79a5c790bd Eliminate a locking panic when cleaning up UFS snapshots after a
disk failure.

Each vnode has an embedded lock that controls access to its contents.
However vnodes describing a UFS snapshot all share a single snapshot
lock to coordinate their access and update. As part of mounting a
UFS filesystem with snapshots, each of the vnodes describing a
snapshot has its individual lock replaced with the snapshot lock.
When the filesystem is unmounted the vnode's original lock is
returned replacing the snapshot lock.

When a disk fails while the UFS filesystem it contains is still
mounted (for example when a thumb drive is removed) UFS forcibly
unmounts the filesystem. The loss of the drive causes the GEOM
subsystem to orphan the provider, but the consumer remains until
the filesystem has finished with the unmount. Information describing
the snapshot locks was being prematurely cleared during the orphaning
causing the return of the snapshot vnode's original locks to fail.
The fix is to not clear the needed information prematurely.

Sponsored by: Netflix
2021-01-15 16:36:42 -08:00
Kirk McKusick
173779b98f Eliminate lock order reversal in UFS when unmounting filesystems
with snapshots.

Each vnode has an embedded lock that controls access to its contents.
However vnodes describing a UFS snapshot all share a single snapshot
lock to coordinate their access and update.  As part of mounting a
UFS filesystem with snapshots, each of the vnodes describing a
snapshot has its individual lock replaced with the snapshot lock.
When the filesystem is unmounted the vnode's original lock is
returned replacing the snapshot lock.

The lock order reversal happens because vnode locks must be acquired
before snapshot locks. When unmounting we must lock both the snapshot
lock and the vnode lock before swapping them so that the vnode will
be continuously locked during the swap. For each vnode representing
a snapshot, we must first acquire the snapshot lock to ensure
exclusive access to it and its original lock.  We then face a lock
order reversal when we try to acquire the original vnode lock. The
problem is eliminated by doing a non-blocking exclusive lock on the
original lock which will always succeed since there are no users
of that lock.

Sponsored by: Netflix
2021-01-15 16:03:01 -08:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Kirk McKusick
2d4422e799 Eliminate lock order reversal in UFS ffs_unmount().
UFS uses a new "mntfs" pseudo file system which provides private
device vnodes for a file system to safely access its disk device.
The original device vnode is saved in um_odevvp to hold the exclusive
lock on the device so that any attempts to open it for writing will
fail. But it is otherwise unused and has its BO_NOBUFS flag set to
enforce that file systems using mntfs vnodes do not accidentally
use the original devfs vnode. When the file system is unmounted,
um_odevvp is no longer needed and is released.

The lock order reversal happens because device vnodes must be locked
before UFS vnodes. During unmount, the root directory vnode lock
is held. When when calling vrele() on um_odevvp, vrele() attempts to
exclusive lock um_odevvp causing the lock order reversal. The problem
is eliminated by doing a non-blocking exclusive lock on um_odevvp
which will always succeed since there are no users of um_odevvp.
With um_odevvp locked, it can be released using vput which does not
attempt to do a blocking exclusive lock request and thus avoids the
lock order reversal.

Sponsored by: Netflix
2021-01-11 16:49:07 -08:00
Thomas Munro
e7347be9e3 ffs: Support O_DSYNC.
Respect the new IO_DATASYNC flag when performing synchronous writes.
Compared to O_SYNC, O_DSYNC lets us skip updating the inode in some
cases, matching the behaviour of fdatasync(2).

Reviewed by: kib
Differential Review: https://reviews.freebsd.org/D25160
2021-01-08 13:15:56 +13:00
Mateusz Guzik
3e506a67bb vfs: add v_irflag accessors
Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D27793
2021-01-03 06:50:06 +00:00
Mateusz Guzik
9997aedb8f ufs: use VNPASS when asserting on a vnode in ufs_read_pgcache 2021-01-01 03:14:11 +00:00
Mark Johnston
ace3d9475c ffs: Avoid out-of-bounds accesses in the fs_active bitmap
We use a bitmap to track which cylinder groups have changed between
snapshot creation and filesystem suspension.  The "legs" of the bitmap
are four bytes wide (see ACTIVESET()) so we must round up the allocation
size to a multiple of four bytes.

I believe this bug is harmless since UMA/kmem_* will both pad the
allocation and zero the full allocation.  Note that malloc() does inline
zeroing when the allocation size is known at compile-time.

Reported by:	pho (using KASAN)
Reviewed by:	kib, mckusick
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27731
2020-12-23 11:16:40 -05:00
Ryan Libby
93dba42c0e ffs: quiet -Wstrict-prototypes
Reviewed by:	kib, markj, mckusick
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D27558
2020-12-11 22:51:57 +00:00
Kirk McKusick
bb3c01ec79 Document the BA_CLRBUF flag used in ufs and ext2fs filesystems.
Suggested by: kib
MFC after:    3 days
Sponsored by: Netflix
2020-12-06 20:50:21 +00:00
Konstantin Belousov
2c7ada9917 ufs: handle two more cases of possible VNON vnode returned from VFS_VGET().
Reported by:	kevans
Reviewed by:	mckusick, mjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27457
2020-12-06 18:09:14 +00:00
Konstantin Belousov
21a45add50 ffs: do not read full direct blocks if they are going to be overwritten.
BA_CLRBUF specifies that existing context of the block will be
completely overwritten by caller, so there is no reason to spend io
fetching existing data.  We do the same for indirect blocks.

Reported by:	tmunro
Reviewed by:	mckusick, tmunro
Tested by:	pho, tmunro
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D27353
2020-11-30 17:03:26 +00:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Konstantin Belousov
92bcefd1d2 clear_inodedeps: handle ERELOOKUP from ffs_syncvnode().
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-11-26 18:03:24 +00:00
Konstantin Belousov
07ef907f6e ffs_softdep.c: get_parent_vp(): Fix bp lock leak when inum inode was already freed.
Reported by:	markj, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-11-25 17:12:21 +00:00
Konstantin Belousov
8a1509e442 Handle LoR in flush_pagedep_deps().
When operating in SU or SU+J mode, ffs_syncvnode() might need to
instantiate other vnode by inode number while owning syncing vnode
lock.  Typically this other vnode is the parent of our vnode, but due
to renames occuring right before fsync (or during fsync when we drop
the syncing vnode lock, see below) it might be no longer parent.

More, the called function flush_pagedep_deps() needs to lock other
vnode while owning the lock for vnode which owns the buffer, for which
the dependencies are flushed.  This creates another instance of the
same LoR as was fixed in softdep_sync().

Put the generic code for safe relocking into new SU helper
get_parent_vp() and use it in flush_pagedep_deps().  The case for safe
relocking of two vnodes with undefined lock order was extracted into
vn helper vn_lock_pair().

Due to call sequence
     ffs_syncvnode()->softdep_sync_buf()->flush_pagedep_deps(),
ffs_syncvnode() indicates with ERELOOKUP that passed vnode was
unlocked in process, and can return ENOENT if the passed vnode
reclaimed.  All callers of the function were inspected.

Because UFS namei lookups store auxiliary information about directory
entry in in-memory directory inode, and this information is then used
by UFS code that creates/removed directory entry in the actual
mutating VOPs, it is critical that directory vnode lock is not dropped
between lookup and VOP.  For softdep_prelink(), which ensures that
later link/unlink operation can proceed without overflowing the
journal, calls were moved to the place where it is safe to drop
processing VOP because mutations are not yet applied.  Then, ERELOOKUP
causes restart of the whole VFS operation (typically VFS syscall) at
top level, including the re-lookup of the involved pathes.  [Note that
we already do the same restart for failing calls to vn_start_write(),
so formally this patch does not introduce new behavior.]

Similarly, unsafe calls to fsync in snapshot creation code were
plugged.  A possible view on these failures is that it does not make
sense to continue creating snapshot if the snapshot vnode was
reclaimed due to forced unmount.

It is possible that relock/ERELOOKUP situation occurs in
ffs_truncate() called from ufs_inactive().  In this case, dropping the
vnode lock is not safe.  Detect the situation with VI_DOINGINACT and
reschedule inactivation by setting VI_OWEINACT.  ufs_inactive()
rechecks VI_OWEINACT and avoids reclaiming vnode is truncation failed
this way.

In ffs_truncate(), allocation of the EOF block for partial truncation
is re-done after vnode is synced, since we cannot leave the buffer
locked through ffs_syncvnode().

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-14 05:30:10 +00:00