Create a wrapper for newbus to take giant and for busses to take it too.
bus_topo_lock() should be called before interacting with newbus routines
and unlocked with bus_topo_unlock(). If you need the topology lock for
some reason, bus_topo_mtx() will provide that.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D31831
Sweep over potentially unsafe calls to ifnet_byindex() and wrap them
in epoch. Most of the code touched remains unsafe, as the returned
pointer is being used after epoch exit. Mark that with a comment.
Validate the index argument inside the function, reducing argument
validation requirement from the callers and making V_if_index
private to if.c.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D33263
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
In recent Hyper-V releases on Windows Server 2022, vPCI code does not
initialize the last 4 bit of device bar registers. This behavior change
could result weird problems cuasing PCI code failure when configuring
bars.
Just write all 1's to those bars whose probed values are not the same
as current read ones. This seems to make Hyper-V vPCI and
pci_write_bar() to cooperate correctly on these releases.
Reported by: khng@freebsd.org
Tested by: khng@freebsd.org
MFC after: 2 weeks
Sponsored by: Microsoft
This reverts commit 9ef7df022a ("hyperv: Register hyperv_timecounter
later during boot") and adds a comment explaining why the timecounter
needs to be registered as early as it is.
PR: 259878
Fixes: 9ef7df022a ("hyperv: Register hyperv_timecounter later during boot")
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014
Previously the MSR-based timecounter was registered during
SI_SUB_HYPERVISOR, i.e., very early during boot, and before SI_SUB_LOCK.
After commit 621fd9dcb2 this triggers a panic since the timecounter
list lock is not yet initialized.
The hyperv timecounter does not need to be registered so early, so defer
that to SI_SUB_DRIVERS, at the same time the hyperv TSC timecounter is
registered.
Reported by: whu
Approved by: whu
Fixes: 621fd9dcb2 ("timecounter: Lock the timecounter list")
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a
socket rather than a socket buffer. Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.
When locking for I/O, return an error if the socket is a listening
socket. Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.
Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In
particular, add checks to sendfile() and sorflush().
Reviewed by: tuexen, gallatin
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31657
Add vDSO support for timekeeping devices that support the KVM/XEN
paravirtual clock API.
Also, expose, in the userspace-accessible '<machine/pvclock.h>',
definitions that will be needed by 'libc' to support
'VDSO_TH_ALGO_X86_PVCLK'.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.
Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.
Document these new interfaces with man pages, and oversight from before.
Reviewed by: jhb, bcr
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29937
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly. As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING(). No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The vmbus ISR needs to live in a trampoline. Dynamically allocating a
trampoline at driver initialization time poses some difficulties due to
the fact that the KENTER macro assumes that the offset relative to
tramp_idleptd is fixed at static link time. Another problem is that
native_lapic_ipi_alloc() uses setidt(), which assumes a fixed trampoline
offset.
Rather than fight this, move the Hyper-V ISR to i386/exception.s. Add a
new HYPERV kernel option to make this optional, and configure it by
default on i386. This is sufficient to make use of vmbus(4) after the
4/4 split. Note that vmbus cannot be loaded dynamically and both the
HYPERV option and device must be configured together. I think this is
not too onerous a requirement, since vmbus(4) was previously
non-functional.
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: whu, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30577
Normally raw interrupt handler is provided by the kernel text. But
vmbus module registers its own handler that needs to be mapped into
userspace mapping on PTI kernels.
Reported and reviewed by: whu
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30310
I saw a situation where the driver set CAM_AUTOSNS_VALID on a failed ccb
even though SRB_STATUS_AUTOSENSE_VALID was not set in the status.
The actual sense data remained all zeros.
The problem seems to be that create_storvsc_request() always sets
hv_storvsc_request::sense_info_len, so checking for sense_info_len != 0
is not enough to determine if any auto-sense data is actually available.
Reviewed by: whu, imp
MFC after: 2 weeks
Sponsored by: CyberSecure
Differential Revision: https://reviews.freebsd.org/D30124
Several protocol methods take a sockaddr as input. In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur. Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.
Reported by: syzkaller+KASAN
Reviewed by: tuexen, melifaro
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29519
That fixes disabled keyboard input after Xorg server has been stopped.
Reviewed by: whu
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D28171
The RSC support feature introduced a bit field "rm_internal" in
struct rndis_pktinfo with total size unchanged.
The guest does not use this field in the tx path. However we need to
initialize it to zero in case older hosts which are not aware of this
field.
Fixes: a491581f ("Hyper-V: hn: Enable vSwitch RSC support")
MFC after: 2 weeks
Sponsored by: Microsoft
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.
The patch also updates NVS version to 6.1 as needed for RSC
enablement.
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D29075
When rx packet contains hash value sent from host, store it in
the mbuf's flowid field so when the same mbuf is on the tx path,
the hash value can be used by the host to determine the outgoing
network queue.
MFC after: 2 weeks
Sponsored by: Microsoft
Do not assume that VBE framebuffer metadata can be used. Like with the
EFI fb metadata, it may be null, so we should take care not to
dereference the null vbefb pointer. This avoids a panic when booting
-CURRENT on a gen1 VM in Azure.
Approved by: tsoome
Sponsored by: Miles AS
Differential Revision: https://reviews.freebsd.org/D27533
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.
struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.
Differential Revision: https://reviews.freebsd.org/D27373
The try lock loop in HN_LOCK put the thread spinning on cpu if the lock
is not available. It is possible to cause deadlock if the thread holding
the lock is sleeping. Relinquish the cpu to work around this problem even
it doesn't completely solve the issue. The priority inversion could cause
the livelock no matter how less likely it could happen. A more complete
solution may be needed in the future.
Reported by: Microsoft, Netapp
MFC after: 2 weeks
Sponsored by: Microsoft
It is possible that the vmbus pcib channel is revoked during attach path.
The attach path could be waiting for response from host and this response will never
arrive since the channel has already been revoked from host point of view. Check
this situation during wait complete and return failed if this happens.
Reported by: Netapp
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D26486
In hv_storvsc_io_request() when coring, prevent changing of the send channel
from the base channel to another one. storvsc_poll always probes on the base
channel.
Based upon conversations with Microsoft, changed the handling of srb_status
codes. Most we should never get, others yes. All are treated as retry-able
except for two. We should not get these statuses, but if we ever do, the I/O
state is not known.
Submitted by: Alexander Sideropoulos <Alexander.Sideropoulos@netapp.com>
Reviewed by: trasz, allanjude, whu
MFC after: 1 week
Sponsored by: Netapp Inc
Differential Revision: https://reviews.freebsd.org/D25756
On Gen2 VMs, Hyper-V provides mmio space for framebuffer.
This mmio address range is not useable for other PCI devices.
Currently only efifb driver is using this range without reserving
it from system.
Therefore, vmbus driver reserves it before any other PCI device
drivers start to request mmio addresses.
PR: 222996
Submitted by: weh@microsoft.com
Reported by: dmitry_kuleshov@ukr.net
Reviewed by: decui@microsoft.com
Sponsored by: Microsoft
This change adds Hyper-V socket feature in FreeBSD. New socket address
family AF_HYPERV and its kernel support are added.
Submitted by: Wei Hu <weh@microsoft.com>
Reviewed by: Dexuan Cui <decui@microsoft.com>
Relnotes: yes
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D24061
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
switch over to opt-in instead of opt-out for epoch.
Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.
Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.
Mark several tested drivers as IFF_KNOWSEPOCH.
Reviewed by: hselasky, jeff, bz, gallatin
Differential Revision: https://reviews.freebsd.org/D23674
supposedly may call into ether_input() without network epoch.
They all need to be reviewed before 13.0-RELEASE. Some may need
be fixed. The flag is not planned to be used in the kernel for
a long time.
This change is based on Linux commit 40630f462824ee. csio.resid should
account for transfer_len only for success and SRB_STATUS_DATA_OVERRUN
condition.
I am not sure how exactly this change works, but I have a report from a
user that they see lots of checksum errors when running a pool scrub
concurrently with iozone -l 1 -s 100G. After applying this patch the
problem cannot be reproduced.
Reviewed by: nobody
Sponsored by: CyberSecure
Differential Revision: https://reviews.freebsd.org/D22312
r356087 made it rather innocuous to double-register built-in keyboard
drivers; we now set a flag to indicate that it's been registered and only
act once on a registration anyways. There is no misleading here, as the
follow-up kbd_delete_driver will actually remove the driver as needed now
that the linker set isn't also consulted after kbdinit.
Keyboard drivers are generally registered via linker set. In these cases,
they're also available as kmods which use KPI for registering/unregistering
keyboard drivers outside of the linker set.
For built-in modules, we still fire off MOD_LOAD and maybe even MOD_UNLOAD
if an error occurs, leading to registration via linker set and at MOD_LOAD
time.
This is a minor optimization at best, but it keeps the internal kbd driver
tidy as a future change will merge the linker set driver list into its
internal keyboard_drivers list via SYSINIT and simplify driver lookup by
removing the need to consult the linker set.
Most keyboard drivers are using the genkbd implementations as it is;
formally use them for any that aren't set and make
genkbd_get_fkeystr/genkbd_diag private.
A future change will provide default implementations for some of these where
it makes sense and most of them are already using the genkbd
implementation (e.g. get_fkeystr, diag).
These invocations were directly calling enkbd_diag(), rather than
indirection back through kbdd_diag/kbdsw. While they're functionally
equivent, invoking kbdd_diag where feasible (i.e. not in a diag
implementation) makes it easier to visually identify locking needs in these
other drivers.
A SIM-private field is used for that.
The pointer can be useful when examining a state of a queued ccb.
E.g., a ccb on a da_softc.pending_ccbs.
MFC after: 2 weeks
Add VMBus protocol version 4.0. and 5.0 to support Windows 10 and newer HyperV hosts.
For VMBus 4.0 and newer HyperV, the netvsc gpadl teardown must be done after vmbus close.
Submitted by: whu
MFC after: 2 weeks
Sponsored by: Microsoft
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().
Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions. Use this flag to determine which
arena the kmem virtual addresses are returned to.
Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it
redundant.
Update the nearby comment for UMA_SLAB_KERNEL.
Reviewed by: kib, markj
Discussed with: jeff
Approved by: re (marius)
Differential Revision: https://reviews.freebsd.org/D16845
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)
Reviewed by: kib, markj
Discussed with: jeff, re@
Differential Revision: https://reviews.freebsd.org/D16825
Summary:
Base gcc fails to compile `sys/dev/hyperv/pcib/vmbus_pcib.c` for i386,
with the following -Werror warnings:
cc1: warnings being treated as errors
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'new_pcichild_device':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:567: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'vmbus_pcib_on_channel_callback':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:940: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_pci_protocol_negotiation':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1012: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_pci_enter_d0':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1073: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_send_resources_allocated':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1125: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'vmbus_pcib_map_msi':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1730: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
This is because on i386, several casts from `uint64_t` to a pointer
reduce the value from 64 bit to 32 bit.
For gcc, this can be fixed by an intermediate cast to uintptr_t. Note
that I am assuming the incoming values will always fit into 32 bit!
Differential Revision: https://reviews.freebsd.org/D15753
MFC after: 3 days
Ifuncs selectors dispatch copyin(9) family to the suitable variant, to
set rflags.AC around userspace access. Rflags.AC bit is cleared in
all kernel entry points unconditionally even on machines not
supporting SMAP.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D13838
FreeBSD VM can't boot up on Hyper-V after the recent malloc change in
r335068: Make UMA and malloc(9) return non-executable memory in most cases.
The hypercall page here must be executable.
Fix the boot-up issue by adding M_EXEC.
PR: 229167
Sponsored by: Microsoft
omcast was used without being initialized in the non-multicast case.
The only effect was that the interface's multicast output counter could be
incorrect.
Reported by: Coverity
CID: 1379662
MFC after: 3 days
Sponsored by: Dell EMC
The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.
By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap. The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.
There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.
copyout(9) was rewritten to use vm_fault_quick_hold(). An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls. The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline. If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.
The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done. The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging. I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.
Tested by: pho
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D14633
CAM_SEL_TIMEOUT was introduced in
https://reviews.freebsd.org/D7521 (r304251), which claimed:
"VM shall response to CAM layer with CAM_SEL_TIMEOUT to filter those
invalid LUNs. Never use CAM_DEV_NOT_THERE which will block LUN scan
for LUN number higher than 7."
But it turns out this is not correct:
I think what really filters the invalid LUNs in r304251 is that:
before r304251, we could set the CAM_REQ_CMP without checking
vm_srb->srb_status at all:
ccb->ccb_h.status |= CAM_REQ_CMP.
r304251 checks vm_srb->srb_status and sets ccb->ccb_h.status properly,
so the invalid LUNs are filtered.
I changed my code version to r304251 but replaced the CAM_SEL_TIMEOUT
with CAM_DEV_NOT_THERE, and I confirmed the invalid LUNs can also be
filtered, and I successfully hot-added and hot-removed 8 disks to/from
the VM without any issue.
CAM_SEL_TIMEOUT has an unwanted side effect -- see cam_periph_error():
For a selection timeout, we consider all of the LUNs on
the target to be gone. If the status is CAM_DEV_NOT_THERE,
then we only get rid of the device(s) specified by the
path in the original CCB.
This means: for a VM with a valid LUN on 3:0:0:0, when the VM inquires
3:0:0:1 and the host reports 3:0:0:1 doesn't exist and storvsc returns
CAM_SEL_TIMEOUT to the CAM layer, CAM will detech 3:0:0:0 as well: this
is the bug I reported recently:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226583
PR: 226583
Reviewed by: mav
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D14690
assym is only to be included by other .s files, and should never
actually be assembled by itself.
Reviewed by: imp, bdrewery (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14180
The implementation of the Kernel Page Table Isolation (KPTI) for
amd64, first version. It provides a workaround for the 'meltdown'
vulnerability. PTI is turned off by default for now, enable with the
loader tunable vm.pmap.pti=1.
The pmap page table is split into kernel-mode table and user-mode
table. Kernel-mode table is identical to the non-PTI table, while
usermode table is obtained from kernel table by leaving userspace
mappings intact, but only leaving the following parts of the kernel
mapped:
kernel text (but not modules text)
PCPU
GDT/IDT/user LDT/task structures
IST stacks for NMI and doublefault handlers.
Kernel switches to user page table before returning to usermode, and
restores full kernel page table on the entry. Initial kernel-mode
stack for PTI trampoline is allocated in PCPU, it is only 16
qwords. Kernel entry trampoline switches page tables. then the
hardware trap frame is copied to the normal kstack, and execution
continues.
IST stacks are kept mapped and no trampoline is needed for
NMI/doublefault, but of course page table switch is performed.
On return to usermode, the trampoline is used again, iret frame is
copied to the trampoline stack, page tables are switched and iretq is
executed. The case of iretq faulting due to the invalid usermode
context is tricky, since the frame for fault is appended to the
trampoline frame. Besides copying the fault frame and original
(corrupted) frame to kstack, the fault frame must be patched to make
it look as if the fault occured on the kstack, see the comment in
doret_iret detection code in trap().
Currently kernel pages which are mapped during trampoline operation
are identical for all pmaps. They are registered using
pmap_pti_add_kva(). Besides initial registrations done during boot,
LDT and non-common TSS segments are registered if user requested their
use. In principle, they can be installed into kernel page table per
pmap with some work. Similarly, PCPU can be hidden from userspace
mapping using trampoline PCPU page, but again I do not see much
benefits besides complexity.
PDPE pages for the kernel half of the user page tables are
pre-allocated during boot because we need to know pml4 entries which
are copied to the top-level paging structure page, in advance on a new
pmap creation. I enforce this to avoid iterating over the all
existing pmaps if a new PDPE page is needed for PTI kernel mappings.
The iteration is a known problematic operation on i386.
The need to flush hidden kernel translations on the switch to user
mode make global tables (PG_G) meaningless and even harming, so PG_G
use is disabled for PTI case. Our existing use of PCID is
incompatible with PTI and is automatically disabled if PTI is
enabled. PCID can be forced on only for developer's benefit.
MCE is known to be broken, it requires IST stack to operate completely
correctly even for non-PTI case, and absolutely needs dedicated IST
stack because MCE delivery while trampoline did not switched from PTI
stack is fatal. The fix is pending.
Reviewed by: markj (partially)
Tested by: pho (previous version)
Discussed with: jeff, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
This provides a nice wrarpper around the XPT_PATH_INQ ccb creation and
calling.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D13387
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Background:
- UDP 4-tuple hash type is unconditionally enabled in Hyper-V on WS2016,
which is _not_ affected by NDIS_OBJTYPE_RSS_PARAMS.
- Non-fragment UDP/IPv4 datagrams' hash type is delivered to VM as
TCP_IPV4.
Currently this erroneous behavior only applies to WS2016/Windows10.
Force l3/l4 protocol check, if the RXed packet's hash type is TCP_IPV4,
and the Hyper-V is running on WS2016/Windows10. If the RXed packet is
UDP datagram, adjust mbuf hash type to UDP_IPV4.
MFC after: 3 days
Sponsored by: Microsoft
Event tasks are pinned to their respective CPU by default, in the same
fashion as they were.
Unpin the event tasks by setting hw.vmbus.pin_evttask to 0, if certain
CPUs serve special purpose.
MFC after: 3 days
Sponsored by: Microsoft
UDP checksum offload does not work in Azure if following conditions are
met:
- sizeof(IP hdr + UDP hdr + payload) > 1420.
- IP_DF is not set in IP hdr
Use software checksum for UDP datagrams falling into this category.
Add two tunables to disable UDP/IPv4 and UDP/IPv6 checksum offload, in
case something unexpected happened.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12429
- Add size of an ethernet header to the value configured to NVS. This
does not seem to have any effects if MTU is 1500, but fix hypervisor
side's setting if MTU > 1500.
- Override the MTU setting according to the view from the hypervisor
side.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12352
Since in Azure SYN and SYN|ACK go through the synthetic parts while the
rest of the same TCP flow goes through the VF, apply VF's RSS settings
to synthetic parts to have a consistent hash value/type for the same TCP
flow.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12333
This helps to detect when UDP hash types can be supported.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12177
The conditional compiling in the review request is removed, since
these IOCTLs will be available in stable/10 and stable/11.
Reviewed by: gallatin
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12175
Do this even for non-transparent mode VF. Better safe than sorry.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11981
- Update hn(4)'s stats properly for non-transparent mode VF.
- Allow BPF tapping to hn(4) for non-transparent mode VF.
- Don't setup mbuf hash, if 'options RSS' is set.
In Azure, when VF is activated, TCP SYN and SYN|ACK go through hn(4)
while the rest of segments and ACKs belonging to the same TCP 4-tuple
go through the VF. So don't setup mbuf hash, if a VF is activated
and 'options RSS' is not enabled. hn(4) and the VF may use neither
the same RSS hash key nor the same RSS hash function, so the hash
value for packets belonging to the same flow could be different!
- Disable LRO.
hn(4) will only receive broadcast packets, multicast packets, TCP
SYN and SYN|ACK (in Azure), LRO is useless for these packet types.
For non-transparent, we definitely _cannot_ enable LRO at all, since
the LRO flush will use hn(4) as the receiving interface; i.e.
hn_ifp->if_input(hn_ifp, m).
While I'm here, remove unapplied comment and minor style change.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11978
While, I'm here add comment about why updating VF's imcast stat is
not necessary.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11948
How network VF works with hn(4) on Hyper-V in transparent mode:
- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
address.
- Once the network VF is attached, the cooresponding hn(4) waits several
seconds to make sure that the network VF attach routing completes, then:
o Set the intersection of the network VF's if_capabilities and the
cooresponding hn(4)'s if_capabilities to the cooresponding hn(4)'s
if_capabilities. And adjust the cooresponding hn(4) if_capable and
if_hwassist accordingly. (*)
o Make sure that the cooresponding hn(4)'s TSO parameters meet the
constraints posed by both the network VF and the cooresponding hn(4).
(*)
o The network VF's if_input is overridden. The overriding if_input
changes the input packet's rcvif to the cooreponding hn(4). The
network layers are tricked into thinking that all packets are
neceived by the cooresponding hn(4).
o If the cooresponding hn(4) was brought up, bring up the network VF.
The transmission dispatched to the cooresponding hn(4) are
redispatched to the network VF.
o Bringing down the cooresponding hn(4) also brings down the network
VF.
o All IOCTLs issued to the cooresponding hn(4) are pass-through'ed to
the network VF; the cooresponding hn(4) changes its internal state
if necessary.
o The media status of the cooresponding hn(4) solely relies on the
network VF.
o If there are multicast filters on the cooresponding hn(4), allmulti
will be enabled on the network VF. (**)
- Once the network VF is detached. Undo all damages did to the
cooresponding hn(4) in the above item.
NOTE:
No operation should be issued directly to the network VF, if the
network VF transparent mode is enabled. The network VF transparent mode
can be enabled by setting tunable hw.hn.vf_transparent to 1. The network
VF transparent mode is _not_ enabled by default, as of this commit.
The benefit of the network VF transparent mode is that the network VF
attachment and detachment are transparent to all network layers; e.g. live
migration detaches and reattaches the network VF.
The major drawbacks of the network VF transparent mode:
- The netmap(4) support is lost, even if the VF supports it.
- ALTQ does not work, since if_start method cannot be properly supported.
(*)
These decisions were made so that things will not be messed up too much
during the transition period.
(**)
This does _not_ need to go through the fancy multicast filter management
stuffs like what vlan(4) has, at least currently:
- As of this write, multicast does not work in Azure.
- As of this write, multicast packets go through the cooresponding hn(4).
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11803
This prepares for the upcoming transparent VF support.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11708
This status will be reported if the backend NIC is wireless; it's not
useful. Due to the high frequency of the reporting, this could be
pretty annoying; ignore it.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11651
The VF-HN map will be used later on to implement "transparent VF".
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11618
This unbreaks the CDROM attaching on GEN2 VMs. On GEN1 VMs, CDROM is
attached to emulated ATA controller.
PR: 220790
Submitted by: Hongjiang Zhang <honzhan microsoft com>
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11634
--Remove special-case handling of sparc64 bus_dmamap* functions.
Replace with a more generic mechanism that allows MD busdma
implementations to generate inline mapping functions by
defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This
is currently useful for sparc64, x86, and arm64, which all
implement non-load dmamap operations as simple wrappers
around map objects which may be bus- or device-specific.
--Remove NULL-checked bus_dmamap macros. Implement the
equivalent NULL checks in the inlined x86 implementation.
For non-x86 platforms, these checks are a minor pessimization
as those platforms do not currently allow NULL maps. NULL
maps were originally allowed on arm64, which appears to have
been the motivation behind adding arm[64]-specific barriers
to bus_dma.h, but that support was removed in r299463.
--Simplify the internal interface used by the bus_dmamap_load*
variants and move it to bus_dma_internal.h
--Fix some drivers that directly include sys/bus_dma.h
despite the recommendations of bus_dma(9)
Reviewed by: kib (previous revision), marius
Differential Revision: https://reviews.freebsd.org/D10729
On some windows hosts TEST_UNIT_READY command will return
SRB_STATUS_ERROR and sense data "NOT READY asc:3a,1 (Medium
not present - tray closed)", this occurs periodically, and
not hurt anything else. So, we prefer to ignore this kind
of errors.
PR: 219973
Submitted by: Hongjiang Zhang <hongzhan microsoft com>
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11271