There have been many changes to rack over the last couple of years, including:
a) Ability when switching stacks to have one stack query another.
b) Internal use of micro-second timers instead of ticks.
c) Many changes to pacing in forms of
1) Improvements to Dynamic Goodput Pacing (DGP)
2) Improvements to fixed rate paciing
3) A new feature called hybrid pacing where the requestor can
get a combination of DGP and fixed rate pacing with deadlines
for delivery that can dynamically speed things up.
d) All kinds of bugs found during extensive testing and use of the
rack stack for streaming video and in fact all data transferred
by NF
Reviewed by: glebius, gallatin, tuexen
Sponsored By: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D39402
So stack switching as always been a bit of a issue. We currently use a break before make setup which means that
if something goes wrong you have to try to get back to a stack. This patch among a lot of other things changes that so
that it is a make before break. We also expand some of the function blocks in prep for new features in rack that will allow
more controlled pacing. We also add other abilities such as the pathway for a stack to query a previous stack to acquire from
it critical state information so things in flight don't get dropped or mis-handled when switching stacks. We also add the
concept of a timer granularity. This allows an alternate stack to change from the old ticks granularity to microseconds and
of course this even gives us a pathway to go to nanosecond timekeeping if we need to (something for the data center to consider
for sure).
Once all this lands I will then update rack to begin using all these new features.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39210
Notable upstream pull request merges:
#12194 Fix short-lived txg caused by autotrim
#13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
#13392 Implementation of block cloning for ZFS
#13741 SHA2 reworking and API for iterating over multiple implementations
#14282 Sync thread should avoid holding the spa config write lock
when possible
#14283 txg_sync should handle write errors in ZIL
#14359 More adaptive ARC eviction
#14469 Fix NULL pointer dereference in zio_ready()
#14479 zfs redact fails when dnodesize=auto
#14496 improve error message of zfs redact
#14500 Skip memory allocation when compressing holes
#14501 FreeBSD: don't verify recycled vnode for zfs control directory
#14502 partially revert PR 14304 (eee9362a7)
#14509 Fix per-jail zfs.mount_snapshot setting
#14514 Fix data race between zil_commit() and zil_suspend()
#14516 System-wide speculative prefetch limit
#14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
#14519 Do not hold spa_config in ZIL while blocked on IO
#14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
#14524 Ignore too large stack in case of dsl_deadlist_merge
#14526 Use .section .rodata instead of .rodata on FreeBSD
#14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
#14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
#14532 Handle unexpected errors in zil_lwb_commit() without ASSERT()
#14544 icp: Prevent compilers from optimizing away memset()
in gcm_clear_ctx()
#14546 Revert zfeature_active() to static
#14556 Remove bad kmem_free() oversight from previous zfsdev_state_list
patch
#14563 Optimize the is_l2cacheable functions
#14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
#14566 FreeBSD: fix false assert in cache_vop_rmdir when replaying ZIL
#14567 spl: Add cmn_err_once() to log a message only on the first call
#14568 Fix incremental receive silently failing for recursive sends
#14569 Restore ASMABI and other Unify work
#14576 Fix detection of IBM Power8 machines (ISA 2.07)
#14577 Better handling for future crypto parameters
#14600 zcommon: Refactor FPU state handling in fletcher4
#14603 Fix prefetching of indirect blocks while destroying
#14633 Fixes in persistent error log
#14639 FreeBSD: Remove extra arc_reduce_target_size() call
#14641 Additional limits on hole reporting
#14649 Drop lying to the compiler in the fletcher4 code
#14652 panic loop when removing slog device
#14653 Update vdev state for spare vdev
#14655 Fix cloning into already dirty dbufs
#14678 Revert "Do not hold spa_config in ZIL while blocked on IO"
Obtained from: OpenZFS
OpenZFS commit: 431083f75b
Add opt_netlink.h to the linux_common module, on i386, where we don't
uses linux_common module, move opt_netlink.h inclusion under
i386 condition.
MFC after: 2 weeks
This is a direct port of the Linux code as the licence allows it, so
style(9) isn't respected to allow applying directly the upstream commits.
Do not add it to linuxkpi directly but add a new linuxkpi_hdmi module
that drm modules will require later, no need to bloat linuxkpi more.
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39122
This change does the following:
Base Netlink KPIs (ability to register the family, parse and/or
write a Netlink message) are always present in the kernel. Specifically,
* Implementation of genetlink family/group registration/removal,
some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in
unconditionally.
* Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are
compiled in unconditionally.
* Glue functions (netlink<>rtsock), malloc/core sysctl definitions
(netlink_glue.c, 259 LoC) are compiled in unconditionally.
* The rest of the KPI _functions_ are defined in the netlink_glue.c,
but their implementation calls a pointer to either the stub function
or the actual function, depending on whether the module is loaded or not.
This approach allows to have only 1k LoC out of ~3.7k LoC (current
sys/netlink implementation) in the kernel, which will not grow further.
It also allows for the generic netlink kernel customers to load
successfully without requiring Netlink module and operate correctly
once Netlink module is loaded.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39269
clang doesn't implement this warning, so violations are only caught by
GCC. It is also no longer a common practice to use this as it was in
the original BSD code, so the need for the warning is not as important
as when it was used to do cleanups 20 years ago. A recent commit
(c3179891f8) triggers this warning on
GCC, but that commit uses nested externs purposefully.
Reviewed by: markj, emaste
Differential Revision: https://reviews.freebsd.org/D39214
This standalone module is the last vestage of ATM support in the tree so
send it on its way.
Reviewed by: manu, emaste
Relnotes: yes
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D38880
Most ATM support was removed prior to FreeBSD 12. The netgraph support
was kept as it was less intrusive, but it is presumed to be unused.
Reviewed by: manu
Relnotes: yes
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D38879
This reduces some duplication between the existing arm64 + x86 section
and the powerpc64 section. To make the diff simpler, enable mlx4 on
powerpc64 since it compiles.
Reviewed by: pkubaj, imp, emaste
Differential Revision: https://reviews.freebsd.org/D38973
kmod.mk appends the value of SRCS.${KERN_OPT} for each defined kernel
option to SRCS. This helper is shorter than appending to SRCS under
explicit checks on KERN_OPTS.
Reviewed by: imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D38738
ACPI is not handled specially by sys/conf/kern.opts.mk (unlike a few
options), so we should fall back on the generic behavior of
sys/conf/config.mk, which pulls from all the generated opt*.h files,
including opt_acpi.h, which will cause DEV_ACPI to be included in
KERN_OPTS. Then the generic machinery in sys/conf/kmod.mk will cause
SRCS.DEV_ACPI to be included in SRCS when appropriate.
Reviewed by: jhb, imp
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D38737
A subsequent commit will instead use existing infrastructure to
exclude the files from hwpmc.ko for non-ACPI builds. Note that the
original commit left the files as optional in sys/conf/files.arm64.
This reverts commit 751d88119f.
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D38736
Summary:
This review ports mlx5 driver, kernel's OFED stack (userland is already enabled), KTLS and krping to powerpc64 and powerpc64le.
krping requires a small change since it uses assembly for amd64 / i386.
NOTE: On powerpc64le RDMA works fine in the userspace with libmlx5, but on powerpc64 it does not. The problem is that contrib/ofed/libmlx5/doorbell.h checks for SIZEOF_LONG but this macro exists on neither powerpc64* nor amd64. Thus, the file silently goes to the fallback function written for 32-bit architectures. It works fine on little-endian architectures, but causes a hard fail on big-endian. It's possible it may also cause some runtime issues on little-endian.
Thus, on powerpc64 I verified that RDMA works with krping.
Reviewers: #powerpc, hselasky
Subscribers: bdrewery, imp, emaste, jhibbits
Differential Revision: https://reviews.freebsd.org/D38786
Summary:
This review ports mlx5 driver, kernel's OFED stack (userland is already enabled), KTLS and krping to powerpc64 and powerpc64le.
krping requires a small change since it uses assembly for amd64 / i386.
NOTE: On powerpc64le RDMA works fine in the userspace with libmlx5, but on powerpc64 it does not. The problem is that contrib/ofed/libmlx5/doorbell.h checks for SIZEOF_LONG but this macro exists on neither powerpc64* nor amd64. Thus, the file silently goes to the fallback function written for 32-bit architectures. It works fine on little-endian architectures, but causes a hard fail on big-endian. It's possible it may also cause some runtime issues on little-endian.
Thus, on powerpc64 I verified that RDMA works with krping.
Reviewers: #powerpc, hselasky
Subscribers: bdrewery, imp, emaste, jhibbits
Differential Revision: https://reviews.freebsd.org/D38786
Notable changes include:
- DSCP QoS Support (leveraging support added in
rG9c950139051298831ce19d01ea5fb33ec6ea7f89)
- Improved PFC handling and TC queue assignments (now all remaining
queues are assigned to TC 0 when more than one TC is enabled and the
number of available queues does not evenly divide between them)
- Support for dumping the internal FW state for additional debugging by
Intel support
- Support for allowing "No FEC" to be a valid state for the LESM to
negotiate when using non-standard compliant modules
Also includes various bug fixes and smaller enhancements, too.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: erj@
Tested by: Jeff Pieper <jeffrey.pieper@intel.com>
MFC after: 3 days
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D38109
Since Linux emulation layer build options was removed there is no reason
to keep opt_compat.h.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D38548
MFC after: 2 weeks
This driver is based of the enic (Cisco VIC) DPDK driver. It provides
basic ethernet functionality. Has been run with various VIC cards to
do UEFI PXE boot with NFS root.
Notable upstream pull request merges:
#13805 Configure zed's diagnosis engine with vdev properties
#14110 zfs list: Allow more fields in ZFS_ITER_SIMPLE mode
#14121 Batch enqueue/dequeue for bqueue
#14123 arc_read()/arc_access() refactoring and cleanup
#14159 Bypass metaslab throttle for removal allocations
#14243 Implement uncached prefetch
#14251 Cache dbuf_hash() calculation
#14253 Allow reciever to override encryption property in case of replication
#14254 Restrict visibility of per-dataset kstats inside FreeBSD jails
#14255 Zero end of embedded block buffer in dump_write_embedded()
#14263 Cleanups identified by CodeQL and Coverity
#14264 Miscellaneous fixes
#14272 Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names
#14287 FreeBSD: Remove stray debug printf
#14288 Colorize zfs diff output
#14289 deadlock between spa_errlog_lock and dp_config_rwlock
#14291 FreeBSD: Fix potential boot panic with bad label
#14292 Add tunable to allow changing micro ZAP's max size
#14293 Turn default_bs and default_ibs into ZFS_MODULE_PARAMs
#14295 zed: add hotplug support for spare vdevs
#14304 Activate filesystem features only in syncing context
#14311 zpool: do guid-based comparison in is_vdev_cb()
#14317 Pack zrlock_t by 8 bytes
#14320 Update arc_summary and arcstat outputs
#14328 FreeBSD: catch up to 1400077
#14376 Use setproctitle to report progress of zfs send
#14340 Remove some dead ARC code
#14358 Wait for txg sync if the last DRR_FREEOBJECTS might result in a hole
#14360 libzpool: fix ddi_strtoull to update nptr
#14364 Fix unprotected zfs_znode_dmu_fini
#14379 zfs_receive_one: Check for the more likely error first
#14380 Cleanup of dead code suggested by Clang Static Analyzer
#14397 Avoid passing an uninitialized index to dsl_prop_known_index
#14404 Fix reading uninitialized variable in receive_read
#14407 free_blocks(): Fix reports from 2016 PVS Studio FreeBSD report
#14418 Introduce minimal ZIL block commit delay
#14422 x86 assembly: fix .size placement and replace .align with .balign
Obtained from: OpenZFS
OpenZFS commit: 9cd71c8604
This updated DDP is intended to be used with the forthcoming ice(4)
driver update to 1.37.7-k. (But it will still work with the current
version.)
Co-authored-by: Piotr Kubaj <pkubaj@FreeBSD.org>
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
MFC after: 1 week
Sponsored by: Intel Corporation
Overview:
Intel(R) QuickAssist Technology (Intel(R) QAT) provides hardware
acceleration for offloading security, authentication and compression
services from the CPU, thus significantly increasing the performance and
efficiency of standard platform solutions.
This commit introduces:
- Intel® 4xxx Series platform support.
- QuickAssist kernel API implementation update for Generation 4 device.
Enabled services: symmetric cryptography and data compression.
- Increased default number of crypto instances in static configuration
for performance purposes.
OCF backend changes:
- changed GCM/CCM MAC validation policy to generate MAC by HW
and validate by SW due to the QAT HW limitations.
Patch co-authored by: Krzysztof Zdziarski <krzysztofx.zdziarski@intel.com>
Patch co-authored by: Michal Jaraczewski <michalx.jaraczewski@intel.com>
Patch co-authored by: Michal Gulbicki <michalx.gulbicki@intel.com>
Patch co-authored by: Julian Grajkowski <julianx.grajkowski@intel.com>
Patch co-authored by: Piotr Kasierski <piotrx.kasierski@intel.com>
Patch co-authored by: Adam Czupryna <adamx.czupryna@intel.com>
Patch co-authored by: Konrad Zelazny <konradx.zelazny@intel.com>
Patch co-authored by: Katarzyna Rucinska <katarzynax.kargol@intel.com>
Patch co-authored by: Lukasz Kolodzinski <lukaszx.kolodzinski@intel.com>
Patch co-authored by: Zbigniew Jedlinski <zbigniewx.jedlinski@intel.com>
Sponsored by: Intel Corporation
Reviewed by: markj, jhb
Differential Revision: https://reviews.freebsd.org/D36254
Simply said, WDAT is an abstraction for the real WDT hardware. For
instance, to add a newer generation WDT to ichwd(4), one must know the
detailed hardware registers, etc..
With WDAT, the necessary IO accesses to operate the WDT are comprehensively
described in it and no hardware knowledge is required.
With this driver, the WDT on Advantech ARK-1124C, Dell R210 and Dell R240 are
detected and operated flawlessly.
* While R210 is also supported by ichwd(4), others are not supported yet.
The unfortunate thing is that not all systems have WDAT defined.
Submitted by: t_uemura at macome.co.jp
Reviewed by: hrs
Differential Revision: https://reviews.freebsd.org/D37493
In 0a9a4d2cd6 a check for OPT_ACPI was added to the hwpmc Makefile
to fix loading the module in a kernel where ACPI has been disabled.
This broke loading the module when ACPI was enabled in the build as
OPT_ACPI isn't a Makefile macro so was always disabled.
Move this check to the C files where the DEV_ACPI macro does exist.
Reviewed by: gnn
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D37773
Silence -Winfinite-recursion for ldo.c in lua and -Wstringop-overread
for nvpair.c.
Reviewed by: mm
Differential Revision: https://reviews.freebsd.org/D37631
This subsystem is superseded by modern debugging facilities,
e.g. DTrace probes and TCP black box logging.
We intentionally leave SO_DEBUG in place, as many utilities may
set it on a socket. Also the tcp::debug DTrace probes look at
this flag on a socket.
Reviewed by: gnn, tuexen
Discussed with: rscheff, rrs, jtl
Differential revision: https://reviews.freebsd.org/D37694