This is a simple call to kmallock_array/kfree, therefore include linux/slab.h as
this is where the kmalloc_array/kfree definition is.
Sponsored-by: The FreeBSD Foundation
Reviewed by: hselsasky
Differential Revision: https://reviews.freebsd.org/D24794
bitmap_copy simply copy the bitmaps, no idea why it exists.
bitmap_andnot is similar to bitmap_and but uses !src2.
Sponsored-by: The FreeBSD Foundation
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D24782
Those function are use to map/unmap io region of a pci device.
Different resource can be mapped depending on the bar so use a
tailq to store them all.
Sponsored-by: The FreeBSD Foundation
Reviewed by: emaste, hselasky
Differential Revision: https://reviews.freebsd.org/D24696
in the LinuxKPI.
This allows synchronize RCU to be used inside a SRCU read section.
No functional change intended.
Bump the __FreeBSD_version to force recompilation of external kernel modules.
PR: 242272
MFC after: 1 week
Sponsored by: Mellanox Technologies
pci_iov_if.h was added to pci.h, but none of the kms-drm branches have
that. Rather than play whack a mole with the branches, move its inclusion to
linux_pci.c which is the only part of the code that needs it now.
Longer term, other solutions will be needed, but this gives us time to get those
deployed on all the supported versions.
Mesa's drm_syncobj usage, in the LinuxKPI.
While at it optimise the jiffies conversion functions to avoid repeated
and constant calculations.
Submitted by: Greg V <greg@unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D23846
MFC after: 1 week
Sponsored by: Mellanox Technologies
For drmkpi (D23085) we don't want the Linux struct file as we don't emulate
everything. Also the prototypes should be in shmem_fs.h to have 100%
compatibility with Linux.
Reviewed by: hselasky
MFC after: Maybe
Differential Revision: https://reviews.freebsd.org/D23764
This function test if the string str begins with the string pointed
at by prefix.
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23767
This function just test if the element is the first of the list.
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23766
Its use of the page lock is incorrect, and it is not used by the DRM
modules.
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D23002
And remove the inline/deprecated attribute use entirely in stdlib.h, from
r355747. The intent was to provide a buildable API transitionary period, but
clearly that was counter-productive.
Reported by: delphij, imp, others
Probably all of these linuxkpi stubs should be '#ifndef' guarded, but maybe
that would prevent people from noticing when they are defined.
Introduced in r355759. For some reason I only ran a buildworld and not a
kernel. Mea culpa.
Reported by: Mark Millard
X-MFC-with: r355759
Bump the __FreeBSD_version to force recompilation of
external kernel modules due to structure change.
Differential Revision: https://reviews.freebsd.org/D21564
Submitted by: Greg V <greg@unrelenting.technology>
MFC after: 1 week
Sponsored by: Mellanox Technologies
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
This patch makes the DRM graphics driver in ports usable on aarch64.
Submitted by: Greg V <greg@unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D21008
MFC after: 1 week
Sponsored by: Mellanox Technologies
The DRM drivers use the lockdep assertion macros with spinlock_t locks
which are backed by mutexes, not sx locks. This causes compile
failures since you can't use sx_assert with a mutex. Instead, change
the lockdep macros to use lock_class methods. This works by assuming
that each LinuxKPI locking primitive embeds a FreeBSD lock as its
first structure and uses a cast to get to the underlying 'struct
lock_object'.
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20992
PowerPC, and possibly other architectures, use different address ranges for
PCI space vs physical address space, which is only mapped at resource
activation time, when the BAR gets written. The DRM kernel modules do not
activate the rman resources, soas not to waste KVA, instead only mapping
parts of the PCI memory at a time. This introduces a
BUS_TRANSLATE_RESOURCE() method, implemented in the Open Firmware/FDT PCI
driver, to perform this necessary translation without activating the
resource.
In addition to system KPI changes, LinuxKPI is updated to handle a
big-endian host, by adding proper endian swaps to the I/O functions.
Submitted by: mmacy
Reported by: hselasky
Differential Revision: https://reviews.freebsd.org/D21096
LINUXKPI_VERSION macro is not defined for any compiled LinuxKPI code
which basically means __GFP_NOTWIRED is never checked when allocating
pages. This should work fine with the existing external DRM code as
long as the page wiring and unwiring is balanced.
MFC after: 3 days
Sponsored by: Mellanox Technologies
- Add rcu list functions.
- Make rcu hlist's foreach macro use rcu calls instead of the non-rcu macro.
- Bump FreeBSD version so we have a checkpoint for the vboxvideo drm driver.
Reviewed by: hps
Approved by: imp (mentor), hps
MFC after: 1 week
Differential Revision: D20719
Previously it did this only on platforms without a direct map. This
also more closely matches Linux's semantics.
Since some DRM v5.0 code assumes the old behaviour, use a
LINUXKPI_VERSION guard to preserve that until the out-of-tree module
is updated.
Reviewed by: hselasky, kib (earlier versions), johalun
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20502
seq_file.h and linux_seq_file.c was imported form ports earlier but
linux_seq_file.c was never compiled in with the module. With this
commit base seq_file will replace ports seq_file and it required a
few modifications to not break functionality and build.
Reviewed by: hps
Approved by: imp (mentor), hps
MFC after: 1 week
DRM drivers expect tasklets to have a counter for enable/disable calls.
Also, add a few more tasklet locking functions.
This patch is part of D19565
Reviewed by: hps
Approved by: imp (mentor), hps
MFC after: 1 week
Assign self as group leader at creation to act as the only member of a
new process group.
This patch is part of D19565
Reviewed by: hps
Approved by: imp (mentor), hps
MFC after: 1 week
Check LINUXKPI_VERSION macro for backwards compatibility.
It's recommended to update any drivers that depend on the older KPI
so we can deprecate < 5.0 code as we update to newer Linux version.
This patch is part of D19565
Reviewed by: hps
Approved by: imp (mentor), hps
MFC after: 1 week
Also, make ktime_get_raw call getnanouptime instead of getnanotime
to match (the correct) ktime_get_raw_ns.
This patch is part of D19565
Reviewed by: hps
Approved by: imp (mentor), hps
MFC after: 1 week
Add support for DIM based on Linux,
with some minor adaptions specific to FreeBSD.
Linux commit
f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33
MFC after: 3 days
Sponsored by: Mellanox Technologies
As Linux comment for this function point:
Signal to the system that the PCI device is not in use by the system
anymore. This only involves disabling PCI bus-mastering, if active.
Build tested drm-current-kmod prior to commit.
MFC after: 1 week
Submitted by: slavash@
Sponsored by: Mellanox Technologies
Turning on pr_debug at compile time make it non-optional at runtime.
This often means that the amount of the debugging is unbearable.
Allow developer to turn on pr_debug output only when needed.
Build tested drm-current-kmod prior to commit.
MFC after: 1 week
Submitted by: kib@
Sponsored by: Mellanox Technologies
The S/G list must be mapped AS-IS without any optimisations.
This also implies that sg_dma_len() must be equal to sg->length.
Many Linux drivers assume this and this fixes some DRM issues.
Put the BUS DMA map pointer into the scatter-gather list to
allow multiple mappings on the same physical memory address.
The FreeBSD version has been bumped to force recompilation of
external kernel modules.
Sponsored by: Mellanox Technologies
Fix some style while at it.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
Sponsored by: Limelight Networks
Sponsored by: Mellanox Technologies
When porting code once written for Linux we find not only uints but also ushort and ulong.
Provide central typedefs as part of the linuxkpi for those as well.
Reviewed by: hselasky, emaste
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19405
- offsets can be negative, loff_t needs to be signed, it also simplifies
interop with the rest of the code base to use off_t than the actual linux
definition "long long"
- don't rely on the defining "file" to "linux_file" in interface definitions
as that causes heartache with includes
Reviewed by: hps@
MFC after: 1 week
Sponsored by: iX Systems
Differential Revision: https://reviews.freebsd.org/D19274
Some consumers may be loosely coupled with the lkpi.
This allows them to call linux_alloc_current without
having a static dependency.
Reviewed by: hps@
MFC after: 1 week
Sponsored by: iX Systems
Differential Revision: https://reviews.freebsd.org/D19257
- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.
Discussed with: jtl, gallatin
The check was not introduced in r342628, but the subsequent unchecked access to
refs was added then, prompting a Coverity warning about "Null pointer
dereferences (FORWARD_NULL)." The warning is bogus due to M_WAITOK, but so is
the NULL check that hints it, so just remove it.
CID: 1398588
Reported by: Coverity
the destroying cdev.
Currently linux_destroy_dev() waits for the reference count on the
linux cdev to drain, and each open file hold the reference.
Practically it means that linux_destroy_dev() is blocked until all
userspace processes that have the cdev open, exit. FreeBSD devfs does
not have such problem, because device refcount only prevents freeing
of the cdev memory, and separate 'active methods' counter blocks
destroy_dev() until all threads leave the cdevsw methods. After that,
attempts to enter cdevsw methods are refused with an error.
Implement somewhat similar mechanism for LinuxKPI cdevs. Demote cdev
refcount to only mean a hold on the linux cdev memory. Add sirefs
count to track both number of threads inside the cdev methods, and for
single-bit indicator that cdev is being destroyed. In the later case,
the call is redirected to the dummy cdev.
Reviewed by: markj
Discussed with: hselasky
Tested by: zeising
MFC after: 1 week
Sponsored by: Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D18606
Currently we always return false if for PCI offline query.
Try to read PCI config, if the return value if 0xffff probably the
PCI is offline.
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Make sure we hold a reference on the character device for every opened file
to prevent the character device to be freed prematurely.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages. vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary. It's a
somewhat legacy component of the system and isn't required. You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.
Based on this, we want pageproc to emulate kswapd, not vmproc.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D18061
These are used by kms-drm to determine various heuristics relate
memory conditions.
The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.
This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h
Reviewed by: mmacy, Johannes Lundberg <johalun0@gmail.com>
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D18052
Currently the compiler picks up the definition in machine/cpufunc.h.
Add compiler memory barriers to read* and write*. The Linux x86
implementation of these functions uses inline asm with "memory" clobber.
The Linux x86 implementation of read_relaxed* and write_relaxed* uses the
same inline asm without "memory" clobber.
Implement ioread* and iowrite* in terms of read* and write* so they also
have memory barriers.
Qualify the addr parameter in write* as volatile.
Like Linux, define macros with the same name as the inline functions.
Only define 64-bit versions on 64-bit architectures because generally
32-bit architectures can't do atomic 64-bit loads and stores.
Regroup the functions a bit and add brief comments explaining what they do:
- __raw_read*, __raw_write*: atomic, no barriers, no byte swapping
- read_relaxed*, write_relaxed*: atomic, no barriers, little-endian
- read*, write*: atomic, with barriers, little-endian
Add a comment that says our implementation of ioread* and iowrite*
only handles MMIO and does not support port IO.
Reviewed by: hselasky
MFC after: 3 days
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().
Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions. Use this flag to determine which
arena the kmem virtual addresses are returned to.
Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it
redundant.
Update the nearby comment for UMA_SLAB_KERNEL.
Reviewed by: kib, markj
Discussed with: jeff
Approved by: re (marius)
Differential Revision: https://reviews.freebsd.org/D16845
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.
Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.
Bump _FreeBSD_version due to the breaking KPI change.
Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725
While at it rename hlist_add_after() into hlist_add_behind().
Submitted by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Sponsored by: Limelight Networks
the LinuxKPI. Add a comment saying in which Linux version this change was made.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Sponsored by: Limelight Networks
in the LinuxKPI. While at it document when to use the "virtual_address" or
the "address" field in the "vm_fault" structure.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Sponsored by: Limelight Networks
sg_alloc_table_from_pages() function in the LinuxKPI.
This basically allow segments to have a limit, max_segment.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Sponsored by: Limelight Networks
with upstream Linux by returning the pointer to the removed element.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
- Add macros to allow preinitialization of cap_rights_t.
- Convert most commonly used code paths to use preinitialized cap_rights_t.
A 3.6% speedup in fstat was measured with this change.
Reported by: mjg
Reviewed by: oshogbo
Approved by: sbruno
MFC after: 1 month
- Make sure Giant is locked when calling PCI device methods.
Newbus currently requires this.
- Avoid unlocking Giant right before aquiring the sleepqueue lock.
This can save a task switch.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When disabling the MSIX IRQ vectors for a PCI device through the
LinuxKPI, make sure any old MSIX IRQ numbers are no longer visible to
the linux_pci_find_irq_dev() function else IRQs can be requested from
the wrong PCI device.
MFC after: 1 week
Sponsored by: Mellanox Technologies
instead of pause() in the msleep() function to avoid rounding errors when
converting delay values forth and back. Add a guard for a delay value
of zero milliseconds which is undefined.
MFC after: 1 week
Requested by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies
delayed_work in the LinuxKPI. This allows the timer_pending() function
macro to be used with delayed work structures.
No functional nor structural change.
MFC after: 1 week
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies
Sponsored by: Limelight Networks
signal in the LinuxKPI.
The read(), write() and mmap() system calls can return either EINTR or
ERESTART upon receiving a signal. Add code to figure out the correct
return value by temporarily storing the return code from the relevant
FreeBSD kernel APIs in the Linux task structure.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Make sure all RCU dereferencing use the READ_ONCE() function macro.
MFC after: 1 week
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies
Sponsored by: Limelight Networks
printk_ratelimited() function macro to return a boolean stating if there
was a printout, true, or not, false.
MFC after: 1 week
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies
LinuxKPI to be compatible with Linux. No functional change.
MFC after: 1 week
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies
handler in the LinuxKPI. This is needed when the interrupt handler is disabled
before freeing the interrupt.
MFC after: 1 week
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies
When the owner of the wire reference releases the last reference, it
might be that the page was already attempted to be freed (but free
cannot be performed at that time due to wire). Check that the page
was removed from the object as the indicator of the free attempt and
finish the free operation if so.
Reported and tested by: Slava Shwartsman
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week
actually return the computed result instead of the input value.
This is a regression issue after r289572.
Found by: gcc6
MFC after: 3 days
Sponsored by: Mellanox Technologies
LinuxKPI task struct. Change type of "state" variable from "int" to
"atomic_t" to simplify code and avoid unneccessary casting.
MFC after: 1 week
Sponsored by: Mellanox Technologies
gets drained before invoking the work function. Else the timer
mutex may still be in use which can lead to use-after-free situations,
because the work function might free the work structure before returning.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Bump the FreeBSD version to force recompilation of external
kernel modules due to structure change.
PR: 222504
Submitted by: Greg V <greg@unrelenting.technology>
MFC after: 1 week
Sponsored by: Mellanox Technologies
have a scope ID. Change size of the searched scope ID to the full
16-bits. There can typically be more than 255 interfaces.
Suggested by: ae @
MFC after: 1 week
Sponsored by: Mellanox Technologies
Workaround problem that ifa_ifwithaddr() also matches the scope ID of
the IPv6 address when searching for a maching IPv6 address. For now
simply try all valid scope IDs until a match is found.
MFC after: 1 week
Sponsored by: Mellanox Technologies
use of the linux_poll_wakeup() function from unsafe contexts, which
can lead to use-after-free issues.
Instead of calling linux_poll_wakeup() directly use the wake_up()
family of functions in the LinuxKPI to do this.
Bump the FreeBSD version to force recompilation of external kernel modules.
MFC after: 1 week
Sponsored by: Mellanox Technologies
in the VM area structure in the LinuxKPI when doing mmap() and that
unsupported bits are masked away.
While at it fix some redundant use of parenthesing inside some related
macros.
Found by: KrishnamRaju ErapaRaju <Krishna2@chelsio.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Such drivers attach to a vgapci bus rather than directly to a pci bus. For
the rest of the LinuxKPI to work correctly in this case, we override the
vgapci bus' ivars with those of the grandparent.
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11932
after r319757.
1) Correct the return value from __wait_event_common() from 1 to 0 in
case the timeout is specified as MAX_SCHEDULE_TIMEOUT. In the other
case __ret is zero and will be substituted in the last part of the
macro with the appropriate value before return.
2) Make sure the "timeout" argument is casted to "int" before
evaluating negativity. Else the signedness of a "long" might be
checked instead of the signedness of an integer.
3) The wait_event() function should not have a return value.
Found by: KrishnamRaju ErapaRaju <Krishna2@chelsio.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
While there, switch to FreeBSD internal callout active status.
Reviewed by: markj, hselasky
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D11900
wide enough to hold the full 64-bit dev_t. Instead use the "dev" field in
the "linux_cdev" structure to store and lookup this value.
While at it remove superfluous use of parenthesis inside the
MAJOR(), MINOR() and MKDEV() macros in the LinuxKPI.
MFC after: 1 week
Sponsored by: Mellanox Technologies
It looks like the __acquire and __release macros are for the consumption
of static analysis tools and have no semantic effect. Transform the
definitions from constant expressions to empty statements in order to
avoid -Wunused-value from gcc.
Likewise avoid future warnings for __chk_{user,io}_ptr, but with a cast
to void, because it looks like some linux kernel code may use those in
expression contexts.
Reviewed by: hselasky, markj
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11695
Also add some checks for overflow to existing functions.
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11533
Add io_mapping_init_wc() and add a third (unused) parameter to
io_mapping_map_wc().
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11286
list.h includes a number of FreeBSD headers as a workaround for the
LIST_HEAD name collision. To reduce pollution, avoid including list.h
in commonly used headers when it is not explicitly needed.
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11249
In particular:
- Don't evaluate event conditions with a sleepqueue lock held, since such
code may attempt to acquire arbitrary locks.
- Fix the return value for wait_event_interruptible() in the case that the
wait is interrupted by a signal.
- Implement wait_on_bit_timeout() and wait_on_atomic_t().
- Implement some functions used to test for pending signals.
- Implement a number of wait_event_*() variants and unify the existing
implementations.
- Unify the mechanism used by wait_event_*() and schedule() to put the
calling thread to sleep.
This is required to support updated DRM drivers. Thanks to hselasky for
finding and fixing a number of bugs in the original revision.
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10986
ARM and MIPS fail universe builds.
ARM and MIPS are missing the following:
* VM_MEMATTR_WRITE_THROUGH
* VM_MEMATTR_WRITE_COMBINING
Pointy-hat to: jhibbits
arm, mips, and powerpc all implement pmap_mapdev_attr() and pmap_unmapdev(),
so add those archs to the checks. powerpc also includes the atomic_swap_*()
functions, so add that to the supported list as well. Not tested except by
compiling powerpc.
Reviewed by: markj
polling contexts in the LinuxKPI.
After the kqueue() support was added to the LinuxKPI in r319409 the
Linux poll file operation will be used outside the system file polling
callback function, which can cause a NULL-pointer panic inside
selrecord() because curthread->td_sel is set to NULL. This patch moves
the selrecord() call away from poll_wait() and to the system file poll
callback function in the LinuxKPI, which essentially wraps the Linux
one. This is similar to what the cuse(3) module is currently doing.
Refer to sys/fs/cuse/*.[ch] for more details.
MFC after: 1 week
Sponsored by: Mellanox Technologies
devices. The implementation allows read and write filters to be
created and piggybacks on the poll() file operation to determine when
a filter should trigger. The piggyback mechanism is simply to check
for the EWOULDBLOCK or EAGAIN return code from read(), write() or
ioctl() system calls and then update the kqueue() polling state bits.
The implementation is similar to the one found in the cuse(3) module.
Refer to sys/fs/cuse/*.[ch] for more details.
MFC after: 1 week
Sponsored by: Mellanox Technologies
printk_ratelimited() in the LinuxKPI.
While at it fix the inclusion guard of printk.h to be similar to the
rest of the LinuxKPI header files.
MFC after: 1 week
Sponsored by: Mellanox Technologies
task structure to avoid deadlock when tearing down the VM object
during a process exit.
Found by: markj @
MFC after: 1 week
Sponsored by: Mellanox Technologies
- Allow "struct linux_file" to be refcounted when its "_file" member
is NULL by using its "f_count" field. The reference counts are
transferred to the file structure when the file descriptor is
installed.
- Add missing vdrop() calls for error cases during open().
- Set the "_file" member of "struct linux_file" during open. This
allows use of refcounting through get_file() and fput() with LinuxKPI
character devices.
MFC after: 1 week
Sponsored by: Mellanox Technologies
kit, CK, in the LinuxKPI.
When threads are pinned to a CPU core or when there is only one CPU,
it can happen that a higher priority thread can call the CK
synchronize function while a lower priority thread holds the read
lock. Because the CK's synchronize is a simple wait loop this can lead
to a deadlock situation. To solve this problem use the recently
introduced CK's wait callback function.
When detecting a CK blocking condition figure out the lowest priority
among the blockers and update the calling thread's priority and
yield. If another CPU core is holding the read lock, pin the thread to
the blocked CPU core and update the priority. The calling threads
priority and CPU bindings are restored before return.
If a thread holding a CK read lock is detected to be sleeping, pause()
will be used instead of yield().
MFC after: 1 week
Sponsored by: Mellanox Technologies
- Move all bitmap related functions from bitops.h to bitmap.h, similar
to what Linux does.
- Apply some minor code cleanup and simplifications to optimize the
generated code when using static inline functions.
- Implement the following list of bitmap functions which are needed by
drm-next and ibcore:
- bitmap_find_next_zero_area_off()
- bitmap_find_next_zero_area()
- bitmap_or()
- bitmap_and()
- bitmap_xor()
- Add missing include directives to the qlnxe driver
(davidcs@ has been notified)
MFC after: 1 week
Sponsored by: Mellanox Technologies
In FreeBSD thread IDs and procedure IDs have distinct number
spaces. When asking for the group leader task ID in the LinuxKPI,
return the procedure ID and let this resolve to the first task in the
procedure having a valid LinuxKPI task structure pointer.
MFC after: 1 week
Sponsored by: Mellanox Technologies
like open, close and fault using the character device pager.
Some notes about the implementation:
1) Linux drivers set the vm_ops and vm_private_data fields during a
mmap() call to indicate that the driver wants to use the LinuxKPI VM
operations. Else these operations are not used.
2) The vm_private_data pointer is associated with a VM area structure
and inserted into an internal LinuxKPI list. If the vm_private_data
pointer already exists, the existing VM area structure is used instead
of the allocated one which gets freed.
3) The LinuxKPI's vm_private_data pointer is used as the callback
handle for the FreeBSD VM object. The VM subsystem in FreeBSD has a
similar list to identify equal handles and will only call the
character device pager's close function once.
4) All LinuxKPI VM operations are serialized through the mmap_sem
sempaphore, which is per procedure, which prevents simultaneous access
to the shared VM area structure when receiving page faults.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
the LinuxKPI for accessing user-space memory in the kernel.
Add functions to hold and wire physical page(s) based on a given range
of user-space virtual addresses.
Add functions to get and put a reference on, wire, hold, mark
accessed, copy and dirty a physical page.
Add new VM related structures and defines as a preparation step for
advancing the memory map capabilities of the LinuxKPI.
Add function to figure out if a virtual address was allocated using
malloc().
Add function to convert a virtual kernel address into its physical
page pointer.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
functions in the LinuxKPI. Add a usage atomic to the task_struct
structure to facilitate refcounting the task structure when returned
from get_pid_task(). The get_task_struct() and put_task_struct()
function is used to manage atomic refcounting. After this change the
task_struct should only be freed through put_task_struct().
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
some associated helper functions in the LinuxKPI. Let the existing
linux_alloc_current() function allocate and initialize the new
structure and let linux_free_current() drop the refcount on the memory
mapping structure. When the mm_struct's refcount reaches zero, the
structure is freed.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
pairwise to support the FreeBSD way of pushing and popping the page
fault flags. Ensure this by requiring every occurrence of pagefault
disable function call to have a corresponding pagefault enable call.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
Support is implemented by mapping Linux's "struct net" into FreeBSD's
"struct vnet". Currently only vnet0 is supported by ibcore.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When locking a mutex and deadlock is detected the first mutex lock
call that sees the deadlock will return -EDEADLK .
MFC after: 1 week
Sponsored by: Mellanox Technologies
Add support for using mutexes during KDB and shutdown. This is also
required for doing mode-switching during panic for drm-next.
Add new mutex functions mutex_init_witness() and mutex_destroy()
allowing LinuxKPI mutexes to be tracked by witness.
Declare mutex_is_locked() and mutex_is_owned() like inline functions
to get cleaner warnings. These functions are used inside WARN_ON()
statements which might look a bit odd if these functions get fully
expanded.
Give mutexes better debug names through the mutex_name() macro when
WITNESS_ALL is defined. The mutex_name() macro can prefix parts of the
filename and line number before the mutex name.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Put large functions into linux_slab.c instead of declaring them static
inline.
Add support for more memory allocation wrappers like kmalloc_array()
and __vmalloc().
Make sure either the M_WAITOK or the M_NOWAIT flag is set and mask
away unused memory allocation flags before calling FreeBSD's malloc()
routine.
Move kmalloc_node() definition to slab.h where it belongs.
Implement support for the SLAB_DESTROY_BY_RCU feature when creating a
kmem_cache which basically means kmem_cache memory is freed using
call_rcu().
MFC after: 1 week
Sponsored by: Mellanox Technologies
LinuxKPI. When the type of the argument is constant the temporary
variable cannot be assigned after the barrier. Instead assign the
temporary variable by initialization.
MFC after: 1 week
Sponsored by: Mellanox Technologies
This change makes the workqueue implementation behave more like in
Linux, both functionality wise and structure wise.
All workqueue code has been moved to linux_work.c
Add an atomic based statemachine to the work_struct to ensure proper
operation. Prior to this change struct_work was directly mapped to a
FreeBSD task. When a taskqueue has multiple threads the same task may
end up being executed on more than one worker thread simultaneously.
This might cause problems with code coming from Linux, which expects
serial behaviour, similar to Linux tasklets.
Move all global workqueue function names into the linux_xxx domain to
avoid symbol name clashes in the future.
Implement a few more workqueue related functions and macros.
Create two multithreaded taskqueues for the LinuxKPI during module
load, one for time-consuming callbacks and one for non-time consuming
callbacks.
MFC after: 1 week
Sponsored by: Mellanox Technologies
WITNESS_ALL is defined. The lock name is based on the filename and
line number where the initialisation happens.
MFC after: 1 week
Sponsored by: Mellanox Technologies
- Optimise the RCU implementation to not allocate and free
ck_epoch_records during runtime. Instead allocate two sets of
ck_epoch_records per CPU for general purpose use. The first set is
only used for reader locks and the second set is only used for
synchronization and barriers and is protected with a regular mutex to
prevent simultaneous issues.
- Move the task structure away from the rcu_head structure and into
the per-CPU structures. This allows the size of the rcu_head structure
to be reduced down to the size of two pointers.
- Fix a bug where the linux_rcu_barrier() function only waited for one
per-CPU epoch record to be completed instead of all.
- Use a critical section or a mutex to protect ck_epoch_begin() and
ck_epoch_end() depending on RCU or SRCU type. All the ck_epoch_xxx()
functions, except ck_epoch_register(), ck_epoch_unregister() and
ck_epoch_recycle() are not re-entrant and needs a critical section or
a mutex to operate in the LinuxKPI, after inspecting the CK
implementation of the above mentioned functions. The simultaneous
issues arise from per-CPU epoch records being shared between multiple
threads depending on the amount of taskswitching and how many threads
are involved with the RCU and SRCU operations.
- Properly free all epoch records by using safe list traversal at
LinuxKPI module unload. It turns out the ck_epoch_recycle() always
have the records on an internal list and use a flag in the epoch
record to track allocated and free entries. This would lead to use
after free during module unload.
- Remove redundant synchronize_rcu() call from the
linux_compat_uninit() function. Let the linux_rcu_runtime_uninit()
function do the final rcu_barrier() instead.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The clang compiler will optimise these functions down to three AMD64
instructions if the bit argument is a constant during compilation.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When allocating unmapped pages, take advantage of the direct map on
AMD64 to get the virtual address corresponding to a page. Else all
pages allocated must be mapped because sometimes the virtual address
of a page is requested.
Move all page allocation and deallocation code into an own C-file.
Add support for GFP_DMA32, GFP_KERNEL, GFP_ATOMIC and __GFP_ZERO
allocation flags.
Make a clear separation between mapped and unmapped allocations.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
The i915kms driver in Linux 4.9 reimplement parts of the scatter list
functions with regards to performance. In other words there is not so
much room for changing structure layouts and functionality if the
i915kms should be built AS-IS. This patch aligns the scatter list
support to what is expected by the i915kms driver. Remove some
comments not needed while at it.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
1) Add better spinlock debug names when WITNESS_ALL is defined.
2) Make sure that the calling thread gets bound to the current CPU
while a spinlock is locked. Some Linux kernel code depends on that the
CPU ID doesn't change while a spinlock is locked.
3) Add support for using LinuxKPI spinlocks during a panic().
MFC after: 1 week
Sponsored by: Mellanox Technologies
Tasklets are implemented using a taskqueue and a small statemachine on
top. The additional statemachine is required to ensure all LinuxKPI
tasklets get serialized. FreeBSD taskqueues do not guarantee
serialisation of its tasks, except when there is only one worker
thread configured.
MFC after: 1 week
Sponsored by: Mellanox Technologies
A set of helper functions have been added to manage the life of the
LinuxKPI task struct. When an external system call or task is invoked,
a check is made to create the task struct by demand. A thread
destructor callback is registered to free the task struct when a
thread exits to avoid memory leaks.
This change lays the ground for emulating the Linux kernel more
closely which is a dependency by the code using the LinuxKPI APIs.
Add new dedicated td_lkpi_task field has been added to struct thread
instead of abusing td_retval[1].
Fix some header file inclusions to make LINT kernel build properly
after this change.
Bump the __FreeBSD_version to force a rebuild of all kernel modules.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The intended use is to annotate frequently used globals which either rarely
change (and thus can be grouped in the same cacheline) or are an atomic counter
(which means it may benefit from being the only variable in the cacheline).
Linker script support is provided only for amd64. Architectures without it risk
having other variables put in, i.e. as if they were not annotated. This is
harmless from correctness point of view.
Reviewed by: bde (previous version)
MFC after: 1 month
the ones obtained through devclass_get_device(). Some minor code
cleanups while at it.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
Linux list_for_each_entry() does not neccessarily end with the iterator
NULL (it may be an offset from NULL if the list member is not the first
element of the member struct).
Reported by: Coverity
CID: 1366940
Reviewed by: hselasky@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8780
My plan is to change this function's prototype at some point in the
future to add a new label argument, which can be used to export all of
sysctl as metrics that can be scraped by Prometheus. Switch over this
caller to use the macro wrapper counterpart.
conflict with the opensolaris kernel module.
This patch solves a problem where the kernel linker will incorrectly
resolve opensolaris kmem_xxx() functions as linuxkpi ones, which leads
to a panic when these functions are used.
Submitted by: gallatin @
Sponsored by: Mellanox Technologies
MFC after: 1 week
FreeBSD supports lazy allocation of PCI BAR, that is, when a device
driver's attach method is invoked, even if the device's PCI BAR
address wasn't initialized, the invocation of bus_alloc_resource_any()
(the call chain: pci_alloc_resource() -> pci_alloc_multi_resource() ->
pci_reserve_map() -> pci_write_bar()) would allocate a proper address
for the PCI BAR and write this 'lazy allocated' address into the PCI
BAR.
This model works fine for native FreeBSD device drivers, but _not_ for
device drivers shared with Linux (e.g. dev/mlx5/mlx5_core/mlx5_main.c
and ofed/drivers/net/mlx4/main.c. Both of them use
pci_request_regions(), which doesn't work properly with the PCI BAR
lazy allocation, because pci_resource_type() -> _pci_get_rle() always
returns NULL, so pci_request_regions() doesn't have the opportunity to
invoke bus_alloc_resource_any(). We now use pci_find_bar() in
pci_resource_type(), which is able to locate all available PCI BARs
even if some of them will be lazy allocated.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: hps
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8071
Linux module parameters have a permissions value. If any write bits
are set we are allowed to modify the module parameter runtime. Reflect
this when creating the static SYSCTL nodes.
Sponsored by: Mellanox Technologies
MFC after: 1 week
Bool module parameters are no longer supported, because there is no
equivalent in FreeBSD.
There are two macros available which control the behaviour of the
LinuxKPI module parameters:
- LINUXKPI_PARAM_PARENT allows the consumer to set the SYSCTL parent
where the modules parameters will be created.
- LINUXKPI_PARAM_PREFIX defines a parameter name prefix, which is
added to all created module parameters.
Sponsored by: Mellanox Technologies
MFC after: 1 week
run after a panic(). This for example allows a LinuxKPI based graphics
stack to receive prints during a panic.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
at it use NULL for some pointer checks.
Bump the FreeBSD version to force recompilation of all kernel modules
due to a structure size change.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
streamline the rest of the xxx_to_jiffies() functions to have a
constant 64-bit argument and use identical range checks for the
result.
Specifically preserve msecs_to_jiffies(0) returning 0. See r282743 for
further details.
MFC after: 1 week
Sponsored by: Mellanox Technologies
error code checks might fail. ERESTART is in the BSD world defined as
-1. While at it add more Linux error codes.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
Linux requires that all IOCTL data resides in userspace. FreeBSD
always moves the main IOCTL structure into a kernel buffer before
invoking the IOCTL handler and then copies it back into userspace,
before returning. Hide this difference in the "linux_copyin()" and
"linux_copyout()" functions by remapping userspace addresses in the
range from 0x10000 to 0x20000, to the kernel IOCTL data buffer.
It is assumed that the userspace code, data and stack segments starts
no lower than memory address 0x400000, which is also stated by "man 1
ld", which means any valid userspace pointer can be passed to regular
LinuxKPI handled IOCTLs.
Bump the FreeBSD version to force recompilation of all kernel modules.
Discussed with: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
"current" inside all LinuxKPI file operation callbacks. The "current"
is frequently used for various debug prints, printing the thread name
and thread ID for example.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
This is kinda critical to the performance when the CPU is slow and
network bandwidth is high, e.g. in the hypervisor.
Reviewed by: rrs, gallatin, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5765