r208454,r208455,r208458:
r207920:
Back out r205134. It is not stable.
r207934:
Add missing new line characters to the warnings.
r207936:
Eventhough r203504 eliminates taste traffic provoked by vdev_geom.c,
ZFS still like to open all vdevs, close them and open them again,
which in turn provokes taste traffic anyway.
I don't know of any clean way to fix it, so do it the hard way - if we can't
open provider for writing just retry 5 times with 0.5 pauses. This should
elimitate accidental races caused by other classes tasting providers created on
top of our vdevs.
Reported by: James R. Van Artsdalen <james-freebsd-fs2@jrv.org>
Reported by: Yuri Pankov <yuri.pankov@gmail.com>
r207937:
I added vfs_lowvnodes event, but it was only used for a short while and now
it is totally unused. Remove it.
r207970:
When there is no memory or KVA, try to help by reclaiming some vnodes.
This helps with 'kmem_map too small' panics.
No objections from: kib
Tested by: Alexander V. Ribchansky <shurik@zk.informjust.ua>
r208142:
The whole point of having dedicated worker thread for each leaf VDEV was to
avoid calling zio_interrupt() from geom_up thread context. It turns out that
when provider is forcibly removed from the system and we kill worker thread
there can still be some ZIOs pending. To complete pending ZIOs when there is
no worker thread anymore we still have to call zio_interrupt() from geom_up
context. To avoid this race just remove use of worker threads altogether.
This should be more or less fine, because I also thought that zio_interrupt()
does more work, but it only makes small UMA allocation with M_WAITOK.
It also saves one context switch per I/O request.
PR: kern/145339
Reported by: Alex Bakhtin <Alex.Bakhtin@gmail.com>
r208147:
Add task structure to zio and use it instead of allocating one.
This eliminates the only place where we can sleep when calling zio_interrupt().
As a side-effect this can actually improve performance a little as we
allocate one less thing for every I/O.
Prodded by: kib
r208148:
Allow to configure UMA usage for ZIO data via loader and turn it on by
default for amd64. On i386 I saw performance degradation when UMA was used,
but for amd64 it should help.
r208166:
Fix userland build by making io_task available only for the kernel and by
providing taskq_dispatch_safe() macro.
r208454:
Remove ZIO_USE_UMA from arc.c as well.
r208455:
ZIO_USE_UMA is no longer used.
r208458:
Create UMA zones unconditionally.
r205134,r205231,r205253,r205264,r205346,r206051,r206667,r206792,r206793,
r206794,r206795,r206796,r206797:
r203504:
Open provider for writting when we find the right one. Opening too much
providers for writing provokes huge traffic related to taste events send
by GEOM on close. This can lead to various problems with opening GEOM
providers that are created on top of other GEOM providers.
Reorted by: Kurt Touet <ktouet@gmail.com>, mr
Tested by: mr, Baginski Darren <kickbsd@ya.ru>
r204067:
Update comment. We also look for GPT partitions.
r204073:
Add tunable and sysctl to skip hostid check on pool import.
r204101:
Don't set f_bsize to recordsize. It might confuse some software (like squid).
Submitted by: Alexander Zagrebin <alexz@visp.ru>
r204804:
Remove racy assertion.
Reported by: Attila Nagy <bra@fsn.hu>
Obtained from: OpenSolaris, Bug ID 6827260
r205079:
Remove bogus assertion.
Reported by: Johan Ström <johan@stromnet.se>
Obtained from: OpenSolaris, Bug ID 6920880
r205080:
Force commit to correct Bug ID:
Obtained from: OpenSolaris, Bug ID 6920880
r205132:
Don't bottleneck on acquiring the stream locks - this avoids a massive
drop off in throughput with large numbers of simultaneous reads
r205133:
fix compilation under ZIO_USE_UMA
r205134:
make UMA the default allocator for ZFS buffers - this avoids
a great deal of contention in kmem_alloc
r205231:
- reduce contention by breaking up ARC state locks in to 16 for data
and 16 for metadata
- export L2ARC tunables as sysctls
- add several kstats to track L2ARC state more precisely
- avoid holding a contended lock when atomically incrementing a
contended counter (no lock protection needed for atomics)
r205253:
use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad
pointed out by Marius Nuennerich and jhb@
r205264:
- cache line align arcs_lock array (h/t Marius Nuennerich)
- fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE
- cache line align buf_hash_table ht_locks array
r205346:
The same code is used to import and to create pool.
The order of operations is the following:
1. Try to open vdev by remembered path and guid.
2. If 1 failed, try to find vdev which guid matches and ignore the path.
3. If 2 failed this means either that the vdev we're looking for is gone
or that pool is being created and vdev doesn't contain proper guid yet.
To be able to handle pool creation we open vdev by path anyway.
Because of 3 it is possible that we open wrong vdev on import which can lead to
confusions.
The solution for this is to check spa_load_state. On pool creation it will be
equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it
is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails,
we open by guid. We no longer open wrong vdev on import.
r206051:
IOCPARM_MAX defines maximum size of a structure that can be passed
directly to ioctl(2). Because of how ioctl command is build using _IO*()
macros we have only 13 bits to encode structure size. So the structure
can be up to 8kB-1.
Currently we define IOCPARM_MAX as PAGE_SIZE.
This is IMHO wrong for three main reasons:
1. It is confusing on archs with page size larger than 8kB (not really
sure if we support such archs (sparc64?)), as even if PAGE_SIZE is
bigger than 8kB, we won't be able to encode anything larger in ioctl
command.
2. It is a waste. Why the structure can be only 4kB on most archs if we
have 13 bits dedicated for that, not 12?
3. It shouldn't depend on architecture and page size. My ioctl command
can work on one arch, but can't on the other?
Increase IOCPARM_MAX to 8kB and make it independed of PAGE_SIZE and
architecture it is compiled for. This allows to use all the bits on all the
archs for size. Note that this doesn't mean we will copy more on every ioctl(2)
call. No. We still copyin(9)/copyout(9) only exact number of bytes encoded in
ioctl command.
Practical use for this change is ZFS. zfs_cmd_t structure used for ZFS
ioctls is larger than 4kB.
Silence on: arch@
r206667:
Fix 3-way deadlock that can happen because of ZFS and vnode lock
order reversal.
thread0 (vfs_fhtovp) thread1 (vop_getattr) thread2 (zfs_recv)
-------------------- --------------------- ------------------
vn_lock
rrw_enter_read
rrw_enter_write (hangs)
rrw_enter_read (hangs)
vn_lock (hangs)
Reported by: Attila Nagy <bra@fsn.hu>
r206792:
Set ARC_L2_WRITING on L2ARC header creation.
Obtained from: OpenSolaris
r206793:
Remove racy assertion.
Obtained from: OpenSolaris
r206794:
Extend locks scope to match OpenSolaris.
r206795:
Add missing list and lock destruction.
r206796:
Style fixes.
r206797:
Restore previous order.
Remove OpenSolaris taskq port (it performs very poorly in our kernel) and
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.
Approved by: re (kib)
- add FreeBSD implementation of xdrmem_control needed by zfs
- have zfs define xdr_ops using FreeBSD's definition
- remove solaris xdr files from zfs compile
about removing a few #ifdefs and providing compatibility wrappers and
VOP implementations to get and set an ACL; ZFS does ACL enforcement all
by itself.
Note that the VOPs are ifdefed out for now, so this change should be
a no-op.
Reviewed by: pjd
This bring huge amount of changes, I'll enumerate only user-visible changes:
- Delegated Administration
Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.
- L2ARC
Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.
- slog
Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).
- vfs.zfs.super_owner
Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.
- chflags(2)
Not all the flags are supported. This still needs work.
- ZFSBoot
Support to boot off of ZFS pool. Not finished, AFAIK.
Submitted by: dfr
- Snapshot properties
- New failure modes
Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests
- Refquota, refreservation properties
Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.
- Sparse volumes
ZVOLs that don't reserve space in the pool.
- External attributes
Compatible with extattr(2).
- NFSv4-ACLs
Not sure about the status, might not be complete yet.
Submitted by: trasz
- Creation-time properties
- Regression tests for zpool(8) command.
Obtained from: OpenSolaris
src/cddl and src/sys/cddl directories per the core@ decision following
the license review.
This change modifies the affected Makefiles to reference the sources
in their new location.
implementing some of them using existing ones.
- Allow to compile ZFS on all archs and use atomic operations surrounded
by global mutex on archs we don't have or can't have all atomic
operations needed by ZFS.