particular, the Giant is supposed to protect against parallel
ntp_adjtime(2) invocations. But, for instance, sys_ntp_adjtime() does
copyout(9) under Giant and then examines time_status to return syscall
result. Since copyout(9) could sleep, the syscall result might be
inconsistent.
Another and more important issue is that if PPS is configured,
hardpps(9) is executed without any protection against the parallel
top-level code invocation. Potentially, this may result in the
inconsistent state of the ntptime state variables, but I cannot say
how serious such distortion is. The non-functional splclock() call in
sys_ntp_adjtime() protected against clock interrupts calling hardpps()
in the pre-SMP era.
Modernize the locking. A mutex protects ntptime data. Due to the
hardpps() KPI legitimately serving from the interrupt filters (and
e.g. uart(4) does call it from filter), the lock cannot be sleepable
mutex if PPS_SYNC is defined. Otherwise, use normal sleepable mutex
to reduce interrupt latency.
Reviewed by: imp, jhb
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6825
private mtx in resettodr(), no implementation of CLOCK_SETTIME() is
allowed to sleep.
Reviewed by: imp, jhb
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
X-Differential revision: https://reviews.freebsd.org/D6825
TDF_SEINTR flags values, unlike TDF_SBDRY, must be treated almost as
if TDF_SBDRY is not set for STOP signal delivery. The only difference
is that sig_suspend_threads() should abort the sleep instead of doing
immediate suspension.
Reported by: ngie
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Approved by: re (gjb)
structure, change it to int.
The real fix is to sanitize user-visible definitions in sys/event.h,
e.g. the affected struct knlist is of no use for userspace programs.
Reported and tested by: jkim
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
sleepable scan, iteration over the shadow chain looking for a page
could find an OBJ_DEAD object. Such state of the mapping is only
transient, the dead object will be terminated and removed from the
chain shortly. We must not return KERN_PROTECTION_FAILURE unless the
object type is changed to OBJT_DEAD in the chain, indicating that
paging on this address is really impossible. Returning
KERN_PROTECTION_FAILURE prematurely causes spurious SIGSEGV delivered
to processes, or kernel accesses to UVA spuriously failing with
EFAULT.
If the object with OBJ_DEAD flag is found, only return
KERN_PROTECTION_FAILURE when object type is already OBJT_DEAD.
Otherwise, sleep a tick and retry the fault handling.
Ideally, we would wait until the OBJ_DEAD flag is resolved, e.g. by
waiting until the paging on this object is finished. But to do so, we
need to reference the dead object, while vm_object_collapse() insists
on owning the final reference on the collapsed object. This could be
fixed by e.g. changing the assert to shared reference release between
vm_fault() and vm_object_collapse(), but it seems to be too much
complications for rare boundary condition.
PR: 204426
Tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
X-Differential revision: https://reviews.freebsd.org/D6085
MFC after: 2 weeks
Approved by: re (gjb)
exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen:
- knote kn_knlist pointer is reset
- INFLUX knote is removed from the process knlist.
And, there are two consequences:
- KN_LIST_UNLOCK() on such knote is nop
- there is nothing which would block exit1() from processing past the
knlist_destroy() (and knlist_destroy() resets knlist lock pointers).
Both consequences result either in leaked process lock, or
dereferencing NULL function pointers for locking.
Handle this by stopping embedding the process knlist into struct proc.
Instead, the knlist is allocated together with struct proc, but marked
as autodestroy on the zombie reap, by knlist_detach() function. The
knlist is freed when last kevent is removed from the list, in
particular, at the zombie reap time if the list is empty. As result,
the knlist_remove_inevent() is no longer needed and removed.
Other changes:
In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events
from kn_sfflags for knote registered by kernel to only get NOTE_CHILD
notifications. The flags leak resulted in excessive
NOTE_EXEC/NOTE_FORK reports.
Fix immediate note activation in filt_procattach(). Condition should
be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT
report for the exiting process.
In knote_fork(), do not perform racy check for KN_INFLUX before kq
lock is taken. Besides being racy, it did not accounted for notes
just added by scan (KN_SCAN).
Some minor and incomplete style fixes.
Analyzed and tested by: Eric Badger <eric@badgerio.us>
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6859
interrupt sleeps with the ERESTART on the suspension attempts.
Otherwise, single-threading requests are deferred until the locks are
granted for NFS files, which causes hangs.
When retrying local registration of the remotely-granted adv lock,
allow full suspension and check for suspension, for usual reasons.
Reported by: markj, pho
Reviewed by: jilles
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
framework allowing to set the suspension policy for the dynamic block.
Extend the currently possible policies of stopping on interruptible
sleeps and ignoring such sleeps by two more: do not suspend at
interruptible sleeps, but interrupt them with either EINTR or ERESTART.
Reviewed by: jilles
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Most of the effect of setting MSR[SF] is that the CPU will stop ignoring
the high 32 bits of registers containing addresses in load/store
instructions. As such, the kernel was setting it only when it began to
need access to high memory. MSR[SF] also affects the operation of some
conditional instructions, however, and so setting it at late times could
subtly break code at very early times. This fixes use of the FDT mode in
loader, and FDT boot more generally, on 64-bit PowerPC systems.
Hardware provided by: IBM LTC
Approved by: re (kib)
[1] Remove unneeded sockaddr conversion before kern_recvit() call as the from
argument is used to record result (the source address of the received message) only.
[2] In Linux the type of msg_namelen member of struct msghdr is signed but native
msg_namelen has a unsigned type (socklen_t). So use the proper storage to fetch fromlen
from userspace and than check the user supplied value and return EINVAL if it is less
than 0 as a Linux do.
Reported by: Thomas Mueller <tmueller at sysgo dot com> [1]
Reviewed by: kib@
Approved by: re (gjb, kib)
MFC after: 3 days
for messages which have been put on the send queue:
* Do not report any DATA or I-DATA chunk padding.
* Correctly deal with the I-DATA chunk header instead of the DATA
chunk header when the I-DATA extension is used.
Approved by: re (kib)
MFC after: 1 week
avos@ pointed out to me that this broke IBSS merging because the rest of
the input path no longer was called for non-IBSS frames.
I committed a change to not input non-IBSS frames, which stopped
nodes being created for BSSes that weren't ours. Unfortunately
thta stopped the input path for non-IBSS frames in general,
so the management input path didn't work.
So, I'll revert this until I come up with a better solution.
(Hopefully before 11.)
Reviewed by: avos
Approved by: re (gjb)
executed by inactive methods, must be repeated on reclaim. In
particular, unlink and free sillyrenamed vnode both on inactivation
and reclaim.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
lists must be functional.
Reported by: Daniel Engberg <daniel.engberg.lists@pyret.net>,
Guy Yur <guyyur@gmail.com>
Tested by: Guy Yur <guyyur@gmail.com>
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb), including the KBI change
pci_if.
This allows bhnd(4) to manage per-device state (such as per-core
pmu/clock refcounting) on behalf of subclass driver instances.
Approved by: re (gjb), adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D6959
The delta between SENTRY5 and BCM was already small due to BCM being
derived from SENTRY5; re-integrating the two avoids the maintenance
overhead of keeping them both in sync with bhnd(4) changes.
- Re-integrate minor SENTRY5 deltas in bcm_machdep.c
- Modify uart_cpu_chipc to allow specifying UART debug/console flags via
kenv and device hints.
- Switch SENTRY5 to std.broadcom
- Enabled CFI flash support for SENTRY5
Reviewed by: Michael Zhilin <mizkha@gmail.com> (Broadcom MIPS support)
Approved by: re (gjb), adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D6897
Replaces use of DEVICE_IDENTIFY with explicit enumeration of chipc
child devices using the chipc capability structure.
This is a precursor to PMU support, which requires more complex resource
assignment handling than achievable with the static device name-based
hints table.
Reviewed by: Michael Zhilin <mizkha@gmail.com> (Broadcom MIPS support)
Approved by: re (gjb), adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D6896
Replace m_getcl() with m_get2(); this fixes 'frame too long'
messages for frames, which are longer than MCLBYTES
(can be easily triggered when A-MSDU is used).
Tested with RTL8188CUS (AP) and RTL8188EU (STA).
Approved by: re (marius)
Free data buffers every time when device is stopped, not when
it is detached; they are allocated at the initialization stage.
How-to-reproduce:
1) ifconfig wlan0 create wlandev urtwn0 up
2) vmstat -m | grep USBdev
3) service netif restart
4) vmstat -m | grep USBdev
Also, remove usbd_transfer_drain() call; it is already called by
usbd_transfer_unsetup().
Tested with RTL8188CUS, STA mode.
Approved by: re (marius)
The new 'machdep.disable_msix_migration' tunable can be set to 1 to
disable migration of MSI-X interrupts.
Xen versions prior to 4.6.0 do not properly handle updates to MSI-X
table entries after the initial write. In particular, the operation
to unmask a table entry after updating it during migration is not
propagated to the "real" table for passthrough devices causing the
interrupt to remain masked. At least some systems in EC2 are
affected by this bug when using SRIOV. The tunable can be set in
loader.conf as a workaround.
Submitted by: Jeremiah Lott <jlott@averesystems.com> (original patch)
Approved by: re (marius)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6947
sys/sys/bitstring.h
Fix a rounding calculation that could undersize a bitstring on
32-bit platforms.
tests/sys/sys/bitstring_test.h
Add a test for bitstr_size
PR: 210260
Reported by: Mark Millard
Reviewed by: gibbs
Approved by: re (marius)
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D6848
teardown of VNETs once pf(4) has been shut down.
Properly split resources into VNET_SYS(UN)INITs and one time module
loading.
While here cover the INET parts in the uninit callpath with proper
#ifdefs.
Approved by: re (gjb)
Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
borrow pf's lock, and also make sure pflog goes after pf is gone
in order to avoid callouts in VNETs to an already freed instance.
Reported by: Ivan Klymenko, Johan Hendriks on current@ today
Obtained from: projects/vnet
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
Approved by: re (gjb)
proper virtualisation, teardown, avoiding use-after-free, race conditions,
no longer creating a thread per VNET (which could easily be a couple of
thousand threads), gracefully ignoring global events (e.g., eventhandlers)
on teardown, clearing various globally cached pointers and checking
them before use.
Reviewed by: kp
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6924
use. Update comments regarding the spare fields in struct inpcb.
Bump __FreeBSD_version for the changes to the size of the structures.
Reviewed by: gnn@
Approved by: re@ (gjb@)
Sponsored by: Chelsio Communications
While reading the code, I noticed that shm_read() returns without unlocking
foffset and rangelock if mac_posixshm_check_read() rejects the read.
Reviewed by: kib, jhb, rwatson
Approved by: re (gjb)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D6927
The GEOM disk d_mtx is only acquired on disk creation and destruction.
It is a good candidate for replacement with a pool mutex. This eliminates
the mutex initialization and teardown and the mutex and name variables
themselves from struct disk.
sys/geom/geom_disk.h:
Take d_mtx and d_mtx_name out of struct disk.
sys/geom/geom_disk.c:
Use mtx_pool_lock() and mtx_pool_unlock() to guard the disk
initialization state instead of a dedicated mutex.
This allows removing the initialization and destruction of
d_mtx.
sys/sys/param.h:
Bump __FreeBSD_version to 1100119 for the change to struct disk.
Suggested by: jhb
Sponsored by: Spectra Logic
Approved by: re (gjb)
this code in the userland stack, it could result in a loop. This happened on iOS.
However, I was not able to reproduce this when using the code in the kernel.
Thanks to Eugen-Andrei Gavriloaie for reporting the issue and proving detailed
information to find the root of the problem.
Approved by: re (gjb)
MFC after: 1 week
If the allocation attempt fails, we may otherwise VM_WAIT after a failed
attempt to reclaim contiguous memory in the requested range. After r297466,
this results in the thread going to sleep, causing a hang during boot.
Reviewed by: jkim, kib
Approved by: re (gjb)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D6945
waiters exist, same as for vm_page_xunbusy(). If previous value of
busy_lock was VPB_SINGLE_EXCLUSIVER, no waiters existed and wakeup is
not needed.
Move common code from vm_page_xunbusy_maybelocked() and
vm_page_xunbusy_hard() to vm_page_xunbusy_locked().
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
getzfsvfs() called vfs_busy() in the waiting mode while having a hold on
a pool (via a call to dmu_objset_hold). In other words,
dp_config_rwlock was held in the shared mode while a thread could be
sleeping in vfs_busy().
The pool's txg sync thread needs to take dp_config_rwlock in the
exclusive mode for some actions, e.g., for executing sync tasks. If the
sync thread gets blocked, then any thread waiting for its sync task to
get executed is also blocked. Which, in turn, could mean that
vfs_busy() will keep waiting indefinitely.
The solution is to use vfs_ref() in the locked section and to call
vfs_busy() only after dropping other locks.
Note that a reference on a struct mount object does not prevent an
associated zfsvfs_t object from being destroyed. So, we have to be
careful to operate only on the struct mount object until we successfully
vfs_busy it.
Approved by: re (gjb)
MFC after: 2 weeks
vcxgbe/vcxl interfaces and retire the 'n' interfaces. The main
cxgbe/cxl interfaces and tunables related to them are not affected by
any of this and will continue to operate as usual.
The driver used to create an additional 'n' interface for every
cxgbe/cxl interface if "device netmap" was in the kernel. The 'n'
interface shared the wire with the main interface but was otherwise
autonomous (with its own MAC address, etc.). It did not have normal
tx/rx but had a specialized netmap-only data path. r291665 added
another set of virtual interfaces (the 'v' interfaces) to the driver.
These had normal tx/rx but no netmap support.
This revision consolidates the features of both the interfaces into the
'v' interface which now has a normal data path, TOE support, and native
netmap support. The 'v' interfaces need to be created explicitly with
the hw.cxgbe.num_vis tunable. This means "device netmap" will not
result in the automatic creation of any virtual interfaces.
The following tunables can be used to override the default number of
queues allocated for each 'v' interface. nofld* = 0 will disable TOE on
the virtual interface and nnm* = 0 to will disable native netmap
support.
# number of normal NIC queues
hw.cxgbe.ntxq_vi
hw.cxgbe.nrxq_vi
# number of TOE queues
hw.cxgbe.nofldtxq_vi
hw.cxgbe.nofldrxq_vi
# number of netmap queues
hw.cxgbe.nnmtxq_vi
hw.cxgbe.nnmrxq_vi
hw.cxgbe.nnm{t,r}xq{10,1}g tunables have been removed.
--- tl;dr version ---
The workflow for netmap on cxgbe starting with FreeBSD 11 is:
1) "device netmap" in the kernel config.
2) "hw.cxgbe.num_vis=2" in loader.conf. num_vis > 2 is ok too, you'll
end up with multiple autonomous netmap-capable interfaces for every
port.
3) "dmesg | grep vcxl | grep netmap" to verify that the interface has
netmap queues.
4) Use any of the 'v' interfaces for netmap. pkt-gen -i vcxl<n>... .
One major improvement is that the netmap interface has a normal data
path as expected.
5) Just ignore the cxl interfaces if you want to use netmap only. No
need to bring them up. The vcxl interfaces are completely independent
and everything should just work.
---------------------
Approved by: re@ (gjb@)
Relnotes: Yes
Sponsored by: Chelsio Communications
This patch addes missing implementation of BHND_BUS_RESET_CORE function for BCMA.
The reset procedure is very simple: enable reset mode, stop clocking,
enable clocking & force clock gating, disable reset mode, stop clock gating.
Tested:
* (michael) Tested on ASUS RT-N53 for enabling/reset USB core
Submitted by: Michael Zhilin <mizhka@gmail.com>
Approved by: re (gjb)