Commit graph

117906 commits

Author SHA1 Message Date
Dag-Erling Smørgrav
d80fbbee5f Correct sysctl names. 2017-08-09 07:24:58 +00:00
Sepherosa Ziehau
9c6cae2431 hyperv/hn: Implement transparent mode network VF.
How network VF works with hn(4) on Hyper-V in transparent mode:

- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
  address.
- Once the network VF is attached, the cooresponding hn(4) waits several
  seconds to make sure that the network VF attach routing completes, then:
  o  Set the intersection of the network VF's if_capabilities and the
     cooresponding hn(4)'s if_capabilities to the cooresponding hn(4)'s
     if_capabilities.  And adjust the cooresponding hn(4) if_capable and
     if_hwassist accordingly. (*)
  o  Make sure that the cooresponding hn(4)'s TSO parameters meet the
     constraints posed by both the network VF and the cooresponding hn(4).
     (*)
  o  The network VF's if_input is overridden.  The overriding if_input
     changes the input packet's rcvif to the cooreponding hn(4).  The
     network layers are tricked into thinking that all packets are
     neceived by the cooresponding hn(4).
  o  If the cooresponding hn(4) was brought up, bring up the network VF.
     The transmission dispatched to the cooresponding hn(4) are
     redispatched to the network VF.
  o  Bringing down the cooresponding hn(4) also brings down the network
     VF.
  o  All IOCTLs issued to the cooresponding hn(4) are pass-through'ed to
     the network VF; the cooresponding hn(4) changes its internal state
     if necessary.
  o  The media status of the cooresponding hn(4) solely relies on the
     network VF.
  o  If there are multicast filters on the cooresponding hn(4), allmulti
     will be enabled on the network VF. (**)
- Once the network VF is detached.  Undo all damages did to the
  cooresponding hn(4) in the above item.

NOTE:
No operation should be issued directly to the network VF, if the
network VF transparent mode is enabled.  The network VF transparent mode
can be enabled by setting tunable hw.hn.vf_transparent to 1.  The network
VF transparent mode is _not_ enabled by default, as of this commit.

The benefit of the network VF transparent mode is that the network VF
attachment and detachment are transparent to all network layers; e.g. live
migration detaches and reattaches the network VF.

The major drawbacks of the network VF transparent mode:
- The netmap(4) support is lost, even if the VF supports it.
- ALTQ does not work, since if_start method cannot be properly supported.

(*)
These decisions were made so that things will not be messed up too much
during the transition period.

(**)
This does _not_ need to go through the fancy multicast filter management
stuffs like what vlan(4) has, at least currently:
- As of this write, multicast does not work in Azure.
- As of this write, multicast packets go through the cooresponding hn(4).

MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11803
2017-08-09 05:59:45 +00:00
Kirk McKusick
77b63aa0fc Since the switch to GPT disk labels, fsck for UFS/FFS has been
unable to automatically find alternate superblocks. This checkin
places the information needed to find alternate superblocks to the
end of the area reserved for the boot block.

Filesystems created with a newfs of this vintage or later will
create the recovery information. If you have a filesystem created
prior to this change and wish to have a recovery block created for
your filesystem, you can do so by running fsck in forground mode
(i.e., do not use the -p or -y options). As it starts, fsck will
ask ``SAVE DATA TO FIND ALTERNATE SUPERBLOCKS'' to which you should
answer yes.

Discussed with: kib, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11589
2017-08-09 05:17:21 +00:00
Alan Cox
5471caf6f1 Introduce vm_page_grab_pages(), which is intended to replace loops calling
vm_page_grab() on consecutive page indices.  Besides simplifying the code
in the caller, vm_page_grab_pages() allows for batching optimizations.
For example, the current implementation replaces calls to vm_page_lookup()
on consecutive page indices by cheaper calls to vm_page_next().

Reviewed by:	kib, markj
Tested by:	pho (an earlier version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11926
2017-08-09 04:23:04 +00:00
Marcin Wojtas
29a263df94 Update pl310 node in Armada 38x DTS to match the one used in Linux
Since the cache controller nodes fixup is added to the platform code,
this patch aligns it to the Linux device tree representation.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11884
2017-08-09 01:31:05 +00:00
Marcin Wojtas
75b2aa51b4 Enable pl310 coherent operation in platform init for Armada 38x
Updating PL310 sotfware context sc_io_coherent field in
platform_pl310_init() routine for Armada 38x helps to avoid
using 'arm,io-coherent' property, which is by default not present
in the device tree node in Linux.

This way another step for DT unification between two operating
systems is done. The improvemnt will also work after enabling
PLATFORM for Marvell ARMv7 SoCs.

Reviewed by: andrew, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11883
2017-08-09 01:25:47 +00:00
Marcin Wojtas
30608f6dd6 Remove clock-frequency properties from Armada 38x timer nodes
Since the timers' base frequency setting is added to the platform code,
this patch removes clock-frequency properties from global
and twd timers, aligning both to the Linux device tree.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11882
2017-08-09 01:20:53 +00:00
Marcin Wojtas
1070a9141c Dynamically configure timers' base frequency for Armada 38x
Instead of using 'clock-frequency' device tree property for global/twd
mpcore timers of Armada 38x SoCs, set it in platform_late_init stage
with arm_tmr_change_frequency() function.

Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11881
2017-08-09 01:14:29 +00:00
Marcin Wojtas
edf3dd3b0a Enable using ofw_bus_find_compatible in early platform code
Before this patch function ofw_bus_find_compatible was using
memory allocations in order to find compatible node and the property's
length. This way there was always a suited buffer for property,
however this approach had also disadvantages - ofw_bus_find_compatible
couldn't be used when malloc is not available, e.g. during fdt fixup stage.

In order to remove the usage limitation of ofw_bus_find_compatible(),
this patch modifies the function to use ofw_bus_node_is_compatible()
(instead of the one without _int suffix), which uses a fixed
buffer on stack instead of dynamic allocations.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11880
2017-08-09 01:06:40 +00:00
Marcin Wojtas
a355bb8846 Add support for "compatible" parameter in ofw_fdt_fixup
Sometimes it's convenient to provide fixup to many boards
that use the same SoC family (eg. Marvell Armada 38x).
Instead of putting multiple entries in fdt_fixup_table,
use one entry which refers to all boards with given SoC.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11878
2017-08-09 00:56:29 +00:00
Marcin Wojtas
f2d9a004fb Restore original /soc ranges on Marvell Armada 38x boards
Because fdt_get_ranges can process now multiple 'ranges' entries,
restoring the ranges from original Linux device trees is possible.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11877
2017-08-09 00:51:45 +00:00
Marcin Wojtas
bfd084c8c4 Enable parsing simple-bus 'ranges' with multiple entries
This patch makes possible to boot with up to 8 ranges in soc.
Dynamic allocation cannot be used, because ftd_get_ranges
function is called early, when malloc is not available.

Change is required for the alignment of Marvell Armada 38x
device trees present in sys/gnu/dts/arm - originally
the platform has 6 entries in simple-bus 'ranges'.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: manu, nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11876
2017-08-09 00:45:25 +00:00
Ian Lepore
b8c53507cb Remove the ds133x and s35390a i2c RTC drivers for now. They both do i2c
transfers in their probe() or attach() routines, and that doesn't work
when the low-level controller requires interrupts to be functional.

The DS133x family of chips is nearly identical to the DS1307 and support
for them should be added to that driver, then the ds133x driver can be
deleted.  The s35390a driver just needs a non-trivial workover.  In both
cases that work will be done and committed separately.
2017-08-08 22:58:34 +00:00
Kristof Provost
7f3ad01804 pf_get_sport(): Prevent possible endless loop when searching for an unused nat port
This is an import of Alexander Bluhm's OpenBSD commit r1.60,
the first chunk had to be modified because on OpenBSD the
'cut' declaration is located elsewhere.

Upstream report by Jingmin Zhou:
https://marc.info/?l=openbsd-pf&m=150020133510896&w=2

OpenBSD commit message:
 Use a 32 bit variable to detect integer overflow when searching for
 an unused nat port.  Prevents a possible endless loop if high port
 is 65535 or low port is 0.
 report and analysis Jingmin Zhou; OK sashan@ visa@
Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c

PR:		221201
Submitted by:	Fabian Keil <fk@fabiankeil.de>
Obtained from:  OpenBSD via ElectroBSD
MFC after:	1 week
2017-08-08 21:09:26 +00:00
Warner Losh
5b896b567d Turns out to be even simpler to just not create /dev/efi if we don't
have a efi runtime.
2017-08-08 21:01:11 +00:00
Warner Losh
9057f54d74 Fail to open efirt device when no EFI on system.
libefivar expects opening /dev/efi to indicate if the we can make efi
runtime calls. With a null routine, it was always succeeding leading
efi_variables_supported() to return the wrong value. Only succeed if
we have an efi_runtime table. Also, while I'm hear, out of an
abundance of caution, add a likely redundant check to make sure
efi_systbl is not NULL before dereferencing it. I know it can't be
NULL if efi_cfgtbl is non-NULL, but the compiler doesn't.
2017-08-08 20:44:16 +00:00
Alexander Motin
3a150601e1 Fix few issues of LinuxKPI workqueue.
LinuxKPI workqueue wrappers reported "successful" cancellation for works
already completed in normal way.  This change brings reported status and
real cancellation fact into sync.  This required for drm-next operation.

Reviewed by:	hselasky (earlier version)
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D11904
2017-08-08 19:36:34 +00:00
John Baldwin
d081dfc7cd Fix a NULL pointer dereference in mly_user_command().
If mly_user_command fails to allocate a command slot it jumps to an 'out'
label used for error handling.  The error handling code checks for a data
buffer in 'mc->mc_data' to free before checking if 'mc' is NULL.  Fix by
just returning directly if we fail to allocate a command and only using
the 'out' label for subsequent errors when there is actual cleanup to
perform.

PR:		217747
Reported by:	PVS-Studio
Reviewed by:	emaste
MFC after:	1 week
2017-08-08 17:49:57 +00:00
Alan Somers
c45796d54e Make p1003_1b.aio_listio_max a tunable
p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the
sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX.
Its value will be bounded from below by the compile-time constant
AIO_LISTIO_MAX and from above by the compile-time constant
MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue.

Reviewed by:	jhb, kib
MFC after:	3 weeks
Relnotes:	yes
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D11601
2017-08-08 16:14:31 +00:00
Warner Losh
d0e75394cf Use the correct queue depth for nda devices.
Submitted by: Matt Williams
2017-08-08 16:06:16 +00:00
Konstantin Belousov
16997138d3 Fix logic error in the the assert, causing the condition to be always true.
Also improve the formatting of the corresponding KASSERT message.

Based on the submission by:	Svyatoslav <razmyslov@viva64.com>
Found by:	PVS-Studio
PR:	217741
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2017-08-08 15:46:29 +00:00
Michael Gmelin
9c52035814 Fix typo in cyapa out of bounds check.
PR:		217783
Submitted by:	razmyslov@viva64.com
MFC after:	1 week
2017-08-08 13:27:32 +00:00
Hans Petter Selasky
8508e4d730 Make sure the received IP header gets 32-bit aligned for short packets
in the mlx5en(4) driver.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-08-08 11:49:36 +00:00
Hans Petter Selasky
869dd4b498 Count drop events due to lack of PCI bandwidth as queue drops and not as
input errors in the mlx5en(4) driver. This improves the sysadmin view of
physical port errors.

Submitted by:		gallatin@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-08-08 11:36:57 +00:00
Hans Petter Selasky
1a59bf5f7a Fix for mlx4en(4) to properly call m_defrag().
The m_defrag() function can only defrag mbuf chains which have a valid
mbuf packet header. In r291699 when the mlx4en(4) driver was converted
into using BUSDMA(9), the call to m_defrag() was moved after the part
of the transmit routine which strips the header from the mbuf chain.
This effectivly disabled the mbuf defrag mechanism and such packets
simply got dropped.

This patch removes the stripping of mbufs from a chain and loads all
mbufs using busdma. If busdma finds there are no segments, unload
the DMA map and free the mbuf right away, because that means all
data in the mbuf has been inlined in the TX ring. Else proceed
as usual.

Add a per-ring rounter for the number of defrag attempts and
make sure the oversized_packets counter gets zeroed while at it.

The counters are per-ring to avoid excessive cache misses in the
TX path.

Submitted by:		mjoras@
Differential Revision:	https://reviews.freebsd.org/D11683
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-08-08 11:35:02 +00:00
Andriy Gapon
984d43cca5 MFV r322242: 8373 TXG_WAIT in ZIL commit path
illumos/illumos-gate@d28671a3b0
d28671a3b0

https://www.illumos.org/issues/8373
  The code that writes ZIL blocks uses dmu_tx_assign(TXG_WAIT) to assign
  a transaction to a transaction group.  That seems to be logically
  incorrect as writing of the ZIL block does not introduce any new dirty
  data.  Also, when there is a lot of dirty data, the call can introduce
  significant delays into the ZIL commit path, thus affecting all
  synchronous writes. Additionally, ARC throttling may affect the ZIL
  writing.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	2 weeks
2017-08-08 11:26:03 +00:00
Andriy Gapon
2653426e89 MFV r322240: 8491 uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS
illumos/illumos-gate@79c2b812ee
79c2b812ee

https://www.illumos.org/issues/8491
  The zpool checkpoint feature in DxOS added a new field in the uberblock.
  The Multi-Modifier Protection Pull Request from ZoL adds two new fields in the
  uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279).
  As these two changes come from two different sources and once upstreamed and
  deployed will introduce an incompatibility with each other we want
  to upstream a change that will reserve the padding for both of them so
  integration goes smoothly and everyone gets both features.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Olaf Faaland <faaland1@llnl.gov>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>

MFC after:	3 weeks
2017-08-08 11:21:58 +00:00
Andriy Gapon
6f2f8727e3 MFV r322238: 7915 checks in l2arc_evict could use some cleaning up
illumos/illumos-gate@267ae6c3a8
267ae6c3a8

https://www.illumos.org/issues/7915
  l2arc_evict() is strictly serialized with respect to
  l2arc_write_buffers() and l2arc_write_done().  Normally, l2arc_evict()
  and l2arc_write_buffers() are called from the same thread, so they can
  not be concurrent.  Also, l2arc_write_buffers() uses zio_wait() on the
  parent zio of all cache zio-s.  That ensures that l2arc_write_done()
  is completed before l2arc_write_buffers() returns.  Finally, if a
  cache device is removed, then l2arc_evict() is called under SCL_ALL in
  the exclusive mode.  That ensures that it can not be concurrent with
  the normal L2ARC accesses to the device (including writing and
  evicting buffers).  Given the above, some checks and actions in
  l2arc_evict() do not make sense.  For instance, it must never
  encounter the write head header let alone remove it from the buffer
  list.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	2 weeks
2017-08-08 11:19:14 +00:00
Andriy Gapon
3cf2ea1aea MFV r322236: 8126 ztest assertion failed in dbuf_dirty due to dn_nlevels changing
illumos/illumos-gate@dcb6872c56
dcb6872c56

https://www.illumos.org/issues/8126
  The sync thread is concurrently modifying dn_phys->dn_nlevels
  while dbuf_dirty() is trying to assert something about it, without
  holding the necessary lock. We need to move this assertion further down
  in the function, after we have acquired the dn_struct_rwlock.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-08-08 11:14:40 +00:00
Andriy Gapon
98a9c8e68a zfs: no need for __DECONST after abd constification in r322233
Note that vdev_label_write_pad2() is FreeBSD specific.

MFC after:	2 weeks
X-MFC after:	r322233
2017-08-08 11:07:34 +00:00
Andriy Gapon
b9a4f29445 MFV r322232: 8426 mark immutable buffer arguments as such in abd.h
illumos/illumos-gate@9b195260e2
9b195260e2

https://www.illumos.org/issues/8426
  abd_copy_from_buf and abd_cmp_buf do not modify their void *buf arguments, so
  qualify them with const.
  abd_copy_from_buf_off and abd_cmp_buf_off already had that type for the
  corresponding arguments.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	2 weeks
2017-08-08 10:59:18 +00:00
Andriy Gapon
9c48e95dd9 MFV r322229: 7600 zfs rollback should pass target snapshot to kernel
illumos/illumos-gate@77b171372e
77b171372e

https://www.illumos.org/issues/7600
  At present, the kernel side code seems to blindly rollback to whatever happens
  to be the latest snapshot at the time when the rollback task is processed.
  The expected target's name should be passed to the kernel driver and the sync
  task should validate that the target exists and that it is the latest snapshot
  indeed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	3 weeks
2017-08-08 10:52:01 +00:00
Andriy Gapon
8605a08bd2 MFV r322227: 8377 Panic in bookmark deletion
illumos/illumos-gate@42418f9e73
42418f9e73

https://www.illumos.org/issues/8377
  The problem is that when dsl_bookmark_destroy_check() is executed from open
  context (the pre-check), it fills in dbda_success based on the existence of the
  bookmark.
  But the bookmark (or containing filesystem as in this case) can be destroyed
  before we get to syncing context. When we re-run dsl_bookmark_destroy_check()
  in syncing
  context, it will not add the deleted bookmark to dbda_success, intending for
  dsl_bookmark_destroy_sync() to not process it. But because the bookmark is
  still in dbda_success
  from the open-context call, we do try to destroy it.
  The fix is that dsl_bookmark_destroy_check() should not modify dbda_success
  when called from open context.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-08-08 10:48:52 +00:00
Andriy Gapon
b4e4140d13 MFV r322223: 8378 crash due to bp in-memory modification of nopwrite block
illumos/illumos-gate@b7edcb9408
b7edcb9408

https://www.illumos.org/issues/8378
  The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which
  we then nopwrite against.
  zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr
  to zgd_bp, dbuf_write_ready()
  could change db_blkptr, and dbuf_write_done() could remove the dirty record.
  dmu_sync() then sees the stale
  BP and that the dbuf it not dirty, so it is eligible for nop-writing.
  The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the
  db_mtx. We could still see a stale
  db_blkptr, but if it is stale then the dirty record will still exist and thus
  we won't attempt to nopwrite.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-08-08 10:46:51 +00:00
Andriy Gapon
c6fb364293 MFV r322221: 7910 l2arc_write_buffers() may write beyond target_sz
FreeBD note: the essence of this change was committed to FreeBSD in
r314274.  This commit catches up with differences between what was
committed to FreeBSD and what was committed to OpenZFS, mainly more
logical variable names.

illumos/illumos-gate@16a7e5ac11
16a7e5ac11

https://www.illumos.org/issues/7910
  It seems that the change in issue #6950 resurrected the problem that was
  earlier fixed by the change in issue #5219.
  Please also see the following FreeBSD bug report:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=216178

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	2 weeks
2017-08-08 10:43:41 +00:00
Mark Johnston
c0589825fd Add round_jiffies_up(), local_clock() and __setup_timer() to the LinuxKPI.
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11871
2017-08-08 04:34:02 +00:00
Mark Johnston
48dac28d63 Add macros for defining attribute groups and for WO and RW attributes.
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11872
2017-08-08 04:30:22 +00:00
Marius Strobl
79f39c6aa1 - If available, use TRIM instead of ERASE for implementing BIO_DELETE.
This also involves adding a quirk table as TRIM is broken for some
  Kingston eMMC devices, though. Compared to ERASE (declared "legacy"
  in the eMMC specification v5.1), TRIM has the advantage of operating
  on write sectors rather than on erase sectors, which typically are
  of a much larger size. Thus, employing TRIM, we don't need to fiddle
  with coalescing BIO_DELETE requests that are also of (write) sector
  units into erase sectors, which might not even add up in all cases.
- For some SanDisk iNAND devices, the CMD38 argument, e. g. ERASE,
  TRIM etc., has to be specified via EXT_CSD[113], which now is also
  handled via a quirk.
- My initial understanding was that for eMMC partitions, the granularity
  should be used as erase sector size, e. g. 128 KB for boot partitions.
  However, rereading the relevant parts of the eMMC specification v5.1,
  this isn't actually correct. So drop the code which used partition
  granularities for delmaxsize and stripesize. For the most part, this
  change is a NOP, though, because a) for ERASE, mmcsd_delete() used
  the erase sector size unconditionally for all partitions anyway and
  b) g_disk_limit() doesn't actually take the stripesize into account.
- Take some more advantage of mmcsd_errmsg() in mmcsd(4) for making
  error codes human readable.
2017-08-07 23:33:05 +00:00
Warner Losh
36d6e01474 Eliminate useless adjustments of aliased device.
No need to set any fields in the cloned device. devfs uses symlinks,
so the adev entries returned won't be presented to the drivers. Since
we don't save copies, nothing else will see them. This code came from
the old compat code, and it appears to be obsolete or never needed.

Submitted by: kib@
Differential Review: https://reviews.freebsd.org/D11919
2017-08-07 22:42:46 +00:00
Warner Losh
d45e16744f Add nvd alias to nda ndoes.
All ndaX and ndaXpY nodes will appear as nvdX and nvdXpY as well
(through symlinks in devfs via the normal disk aliasing mechanism in
GEOM).

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:43 +00:00
Warner Losh
d3517d306c Expose API to allow disks to ask for alias names in devfs.
Implement disk_add_alias to allow aliases to be added to disks. All
disk have a primary name (say "foo") can also have secondary names
(say "bar") such that all instances of "foo" also have a "bar"
alias. So if you have foo0, foo0p1, foo1, foo1s1 and foo1s1a nodes
created by the foo driver and gpart, device nodes bar0, bar0p1, bar1,
bar1s1 and bar1s1a will appear as symlinks back to the original nodes.
This generalizes to multiple aliases. However, since the unit number
follows the primary name, multiple device drivers can't create the
same aliases unless those drives coorinate the unit number space (eg
you couldn't add an alias 'disk' to both 'da' and 'ada' because it's
possible to have da0 and ada0, because 'disk0' is ambiguous).

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:38 +00:00
Warner Losh
5d7d13290a Add alias support to gpart.
When we're creating new providers for each of the partitions, add
aliases to the geom before we create the provider so when geom_dev
tastes the provider, the aliases are in place so the proper /dev
entries are created. So foo5p6 gets created as an alias for bar5p6
when foo is an alias for bar in the geom we're partitioning with
g_part. This also copies aliases from the container geom (eg disk) to
the label geom (the disk with GPT partitioning) so that aliases nest
properly.

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:33 +00:00
Warner Losh
c624eb2598 Add aliasing concept to geom.
Add an alias name list to geoms. Use them in geom_dev to create
aliases. Previously, geom_dev would create an device node for the name
of the geom. Now, additional nodes are created pointing back to the
primary node with make_dev_alias_p. Aliases must be in place on the
geom before any tasting occurs.

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:28 +00:00
Kirk McKusick
6c6118b390 gjournal is broken in handling its flush_queue. If we have 10 bio's
in the flush_queue:
         1 2 3 4 5 6 7 8 9 10
and another 10 bio's go into the flush queue after only the first five
bio's are removed from the flush queue, the queue should look like:
         6 7 8 9 10 11 12 13 14 15 16 17 18 19 20,
but because of the bug we end up with
         6 11 12 13 14  15 16 17 18 19 20 7 8 9 10.
So the sequence of the bio's is damaged in the flush queue (and
therefore in the journal on disk !). This error can be triggered by
ffs_snapshot() when a block is read with readblock() and gjournal finds
this block in the broken flush queue before it goes to the correct
active queue.

The fix is to place all new blocks at the end of the queue.

Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
2017-08-07 19:40:03 +00:00
Kirk McKusick
683590b642 sysctl kern.geom.journal.cache.limit shows negative value for FreeBSD/amd64
system having over 4GB RAM. That's due to:

1) the limit being u_int instead of u_long like vm.kmem_size (the limit is
   half of vm.kmem_size by default for amd64);
2) sysctl handler g_journal_cache_limit_sysctl() using u_int instead of u_long.

The fix is to replace u_int with u_long for the kern.geom.journal.cache.limit
sysctl variable.

PR: 198500
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Reported by: Eugene Grosbein
Discussed with: kib
MFC after: 1 week
2017-08-07 19:18:27 +00:00
Konstantin Belousov
fe04f5e9d0 Avoid DI recursion when reclaim_pv_chunk() is called from
pmap_advise() or pmap_remove().

Reported and tested by:	pho (previous version)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-07 17:29:54 +00:00
Konstantin Belousov
1a47eac0f5 Explain why delayed invalidation is not required in pmap_protect() and
pmap_remove_pages().

Submitted by:	alc
MFC after:	1 week
2017-08-07 17:23:10 +00:00
Alexander Motin
e1cf70fbab Fix hrtimer_active() in case of cancellation.
While there, switch to FreeBSD internal callout active status.

Reviewed by:	markj, hselasky
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D11900
2017-08-07 14:34:05 +00:00
Ruslan Bukin
ca20f8ec29 o Replace __riscv__ with __riscv
o Replace __riscv64 with (__riscv && __riscv_xlen == 64)

This is required to support new GCC 7.1 compiler.
This is compatible with current GCC 6.1 compiler.

RISC-V is extensible ISA and the idea here is to have built-in define
per each extension, so together with __riscv we will have some subset
of these as well (depending on -march string passed to compiler):

__riscv_compressed
__riscv_atomic
__riscv_mul
__riscv_div
__riscv_muldiv
__riscv_fdiv
__riscv_fsqrt
__riscv_float_abi_soft
__riscv_float_abi_single
__riscv_float_abi_double
__riscv_cmodel_medlow
__riscv_cmodel_medany
__riscv_cmodel_pic
__riscv_xlen

Reviewed by:	ngie
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11901
2017-08-07 14:09:57 +00:00
Navdeep Parhar
b96793ae43 cxgbe(4): Add the T6 and T5 Unified Wire configuration files to the
kernel, just like for T4, when the driver is compiled into the kernel.

Reported by:	mav@
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-07 14:04:19 +00:00