Commit graph

121608 commits

Author SHA1 Message Date
Warner Losh
e8bef32ce2 Reword comment to remove awkward constructs, including an "it's" that
shouldn't have been there at all (it wasn't a typo for its, rather a
left-over from an older revision of the comment).

Noticed by: many
2018-04-19 16:05:48 +00:00
John Baldwin
73c8686e91 Simplify the code to allocate stack for auxv, argv[], and environment vectors.
Remove auxarg_size as it was only used once right after a confusing
assignment in each of the variants of exec_copyout_strings().

Reviewed by:	emaste
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D15123
2018-04-19 16:00:34 +00:00
Warner Losh
b3e85e7a79 Intel drives have an optimal alignment for I/O. While they honor I/Os
that cross this boundary, they perform better when this isn't the
case. Intel uses the 3rd byte in the vendor specific area for
this. The DC P3500 was previously listed without any explanation. Add
the DC P3520 and DC P4500 to the list.

There won't be any others drives needing this quirk. Intel has
standardized a field in the namespace data in 1.3 (noiob).  A future
patch will use that if it exists, with fallback to this method.

Submitted by: Keith Busch
Reviewed by: jimharris@
2018-04-19 15:39:20 +00:00
Alexander Motin
596e6ade92 Release memory resource on cuda driver attach failure.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
2018-04-19 15:29:10 +00:00
Conrad Meyer
179b21e8b1 cryptosoft: Do not exceed crd_len around *crypt_multi
When a caller passes in a uio or mbuf chain that is longer than crd_len, in
tandem with a transform that supports the multi-block interface,
swcr_encdec() would process the entire mbuf or uio instead of just the
portion indicated by crd_len (+ crd_skip).

De/encryption are performed in-place, so this would trash subsequent uio or
mbuf contents.

This was introduced in r331639 (mea culpa).  It only affects the
{de,en}crypt_multi() family of interfaces.  That interface only has one
consumer transform in-tree (for now): Chacha20.

PR:		227605
Submitted by:	Valentin Vergez <valentin.vergez AT stormshield.eu>
2018-04-19 15:24:21 +00:00
Randall Stewart
fd389e7cd5 These two modules need the tcp_hpts.h file for
when the option is enabled (not sure how LINT/build-universe
missed this) opps.

Sponsored by:	Netflix Inc
2018-04-19 15:03:48 +00:00
Mark Johnston
64b3893010 Initialize marker pages in vm_page_domain_init().
They were previously initialized by the corresponding page daemon
threads, but for vmd_inacthead this may be too late if
vm_page_deactivate_noreuse() is called during boot.

Reported and tested by:	cperciva
Reviewed by:	alc, kib
MFC after:	1 week
2018-04-19 14:09:44 +00:00
Randall Stewart
3ee9c3c4eb This commit brings in the TCP high precision timer system (tcp_hpts).
It is the forerunner/foundational work of bringing in both Rack and BBR
which use hpts for pacing out packets. The feature is optional and requires
the TCPHPTS option to be enabled before the feature will be active. TCP
modules that use it must assure that the base component is compile in
the kernel in which they are loaded.

MFC after:	Never
Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D15020
2018-04-19 13:37:59 +00:00
Andriy Gapon
f3f6ecb450 set kdb_why to "trap" when calling kdb_trap from trap_fatal
This will allow to hook a ddb script to "kdb.enter.trap" event.
Previously there was no specific name for this event, so it could only
be handled by either "kdb.enter.unknown" or "kdb.enter.default" hooks.
Both are very unspecific.

Having a specific event is useful because the fatal trap condition is
very similar to panic but it has an additional property that the current
stack frame is the frame where the trap occurred.  So, both a register
dump and a stack bottom dump have additional information that can help
analyze the problem.

I have added the event only on architectures that have trap_fatal()
function defined.  I haven't looked at other architectures.  Their
maintainers can add support for the event later.

Sample script:
kdb.enter.trap=bt; show reg; x/aS $rsp,20; x/agx $rsp,20

Reviewed by:	kib, jhb, markj
MFC after:	11 days
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D15093
2018-04-19 05:06:56 +00:00
Konstantin Belousov
b940886338 Add PROC_PDEATHSIG_SET to procctl interface.
Allow processes to request the delivery of a signal upon death of
their parent process.  Supposed consumer of the feature is PostgreSQL.

Submitted by:	Thomas Munro
Reviewed by:	jilles, mjg
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D15106
2018-04-18 21:31:13 +00:00
Konstantin Belousov
919015a42e Fix pmap_trm_alloc(M_ZERO).
Sponsored by:	The FreeBSD Foundation
2018-04-18 20:09:26 +00:00
Konstantin Belousov
17cdb360cb For fatal traps other than pagefaults, print raw fault error codes.
For pagefaults, the error is already decoded and printed.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-04-18 20:07:47 +00:00
John Baldwin
f36411145e Fix two off-by-one errors when allocating MSI and MSI-X interrupts.
x86 enforces an (arbitray) limit on the number of available MSI and
MSI-X interrupts to simplify code (in particular, interrupt_source[]
is statically sized).  This means that an attempt to allocate an MSI
vector needs to fail if it would go beyond the limit, but the checks
for exceeding the limit had an off-by-one error.  In the case of MSI-X
which allocates interrupts one at a time this meant that IRQ 768 kept
getting handed out multiple times for msix_alloc() instead of failing
because all MSI IRQs were in use.

Tested by:	lidl
MFC after:	1 week
2018-04-18 18:45:34 +00:00
John Baldwin
eb33d8ce3e Workaround fixed I/O port resources encoded as I/O port ranges in _CRS.
ACPI I/O port descriptors use _MIN and _MAX fields to specify the set
of allowable base (start) addresses for an I/O port resource along with
a _LEN field specifying the length.  A fixed resource is supposed to be
encoded with _MIN == _MAX, but some buggy firmwares instead set _MAX to
the end of the fixed range.  Relocating I/O ranges only make sense in
_PRS (possible resource settings), not in _CRS (current resource settings),
so if an I/O port range with _MAX set set to the end of the range is
present in _CRS, treat it as a fixed I/O port resource starting at
_MIN.

PR:		224096
Submitted by:	Harald Böhm <harald@boehm.codes>
Pointy hat to:	jhb (taking so long to actually commit this)
MFC after:	1 week
2018-04-18 18:36:26 +00:00
Andriy Gapon
6d83b2e971 don't check for kdb reentry in trap_fatal(), it's impossible
trap() checks for it earlier and calls kdb_reentry().

Discussed with:	jhb
MFC after:	12 days
Sponsored by:	Panzura
2018-04-18 15:44:54 +00:00
Stephen Hurd
0b75ac7796 iflib: Fix queue distribution when there are no threads
Previously, if there are no threads, all queues which targeted
cores that share an L2 cache were bound to a single core. The intent is
to distribute them across these cores.

Reported by:	olivier
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15120
2018-04-18 15:34:18 +00:00
Brooks Davis
edba22b0fb Remove references to fs_nofault_intr_begin/end.
These should have removed in r332656.

Reported by:	mjg, lidl
2018-04-17 22:30:00 +00:00
Mark Johnston
9de8fcfddf Ensure that m and skip_m belong to the same object.
Pages allocated from a given reservation may belong to different
objects. It is therefore possible for vm_page_ps_test() to be called
with the base page's object unlocked. Check for this case before
asserting that the object lock is held.

Reported by:	jhb
Reviewed by:	kib
MFC after:	1 week
2018-04-17 18:49:17 +00:00
John Baldwin
8ce99bb405 Properly do a deep copy of the ioctls capability array for fget_cap().
fget_cap() tries to do a cheaper snapshot of a file descriptor without
holding the file descriptor lock.  This snapshot does not do a deep
copy of the ioctls capability array, but instead uses a different
return value to inform the caller to retry the copy with the lock
held.  However, filecaps_copy() was returning 1 to indicate that a
retry was required, and fget_cap() was checking for 0 (actually
'!filecaps_copy()').  As a result, fget_cap() did not do a deep copy
of the ioctls array and just reused the original pointer.  This cause
multiple file descriptor entries to think they owned the same pointer
and eventually resulted in duplicate frees.

The only code path that I'm aware of that triggers this is to create a
listen socket that has a restricted list of ioctls and then call
accept() which calls fget_cap() with a valid filecaps structure from
getsock_cap().

To fix, change the return value of filecaps_copy() to return true if
it succeeds in copying the caps and false if it fails because the lock
is required.  I find this more intuitive than fixing the caller in
this case.  While here, change the return type from 'int' to 'bool'.

Finally, make filecaps_copy() more robust in the failure case by not
copying any of the source filecaps structure over.  This avoids the
possibility of leaking a pointer into a structure if a similar future
caller doesn't properly handle the return value from filecaps_copy()
at the expense of one more branch.

I also added a test case that panics before this change and now passes.

Reviewed by:	kib
Discussed with:	mjg (not a fan of the extra branch)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D15047
2018-04-17 18:07:40 +00:00
Brooks Davis
9c11d8d483 Remove the unused fuwintr() and suiwintr() functions.
Half of implementations always failed (returned (-1)) and they were
previously used in only one place.

Reviewed by:	kib, andrew
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15102
2018-04-17 18:04:28 +00:00
Brooks Davis
427be5bb3a Remove unused implementations of copyoutstr().
Also remove the commented out documentation.  The documentation arrived
with the import of the copy.9 manpage.  I suspect the implementations
came from NetBSD while bootstrapping the Arm and MIPS ports.

Reviewed by:	andrew, jmallett
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15108
2018-04-17 17:20:04 +00:00
Andrew Gallatin
bca3808097 Restore SIOCGI2C functionality to ixgbe
When ixgbe was converted to iflib, it lost the SIOCGI2C support
that allows ifconfig to print SFP state, optical light levels, etc.
Restore this by plugging in to the ifdi_i2c_req iflib method.  Note
that the sanity checking on dev_addr that used to be done in ixgbe is
now done in iflib.

Reviewed by:	erj, Matthew Macy <mmacy@mattmacy.io>
Sponsored by:	Netflix
2018-04-17 16:51:27 +00:00
Warner Losh
a397def9fb Add PNP info to the PCI attahement of the puc driver.
Adjust sys/conf/files and sys/modules/puc/Makefile to omit
pucdata.c now tht it's included by puc_pci.c.

Submitted by: Lakhan Shiva Kamireddy (with build fixes by me)
Pull Request: https://github.com/freebsd/freebsd/pull/136
2018-04-17 16:46:08 +00:00
Warner Losh
aeb291cc32 Add PNP info to the bce driver.
Submitted by: Lakhan Shiva Kamireddy
Pull Request: https://github.com/freebsd/freebsd/pull/136
2018-04-17 16:46:01 +00:00
Brooks Davis
cee61c8cac Stop using fuswintr() and suswintr() in the profiler.
Always take the AST path rather than calling MD functions which are
often implemented as always failing. The is the case on amd64, arm,
i386, and powerpc. This optimization (inherited from 4.4 Lite) is a
pessimization on those architectures and is the sole use of these
functions. They will be removed in a seperate commit.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15101
2018-04-17 16:36:53 +00:00
Warner Losh
3531bbb5ea Restore db_radix on parse error, otherwise we'll silently change it to
10 on a botched trace command.
2018-04-17 15:44:05 +00:00
Ram Kishore Vegesna
5eaf9435d0 Moved opts-stack.h include before all other includes.
PR: 227446
Approved by: ken
MFC after: 3 days
2018-04-17 15:29:32 +00:00
Alan Somers
52c0983128 lio_listio: return EAGAIN instead of EIO when out of resources
This behavior is already documented by the man page, and suggested by POSIX.

Reviewed by:	jhb
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D15099
2018-04-16 18:12:15 +00:00
Brooks Davis
b6cb3eab1e Remove unused badaddr() function.
Reviewed by:	jmallett
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15078
2018-04-16 17:43:26 +00:00
Warner Losh
8b999cc69b Use bool instead of boolean_t here. No reason to use boolean_t.
Also, stop passing FALSE to a bool parameter.
2018-04-16 13:52:40 +00:00
Warner Losh
9d89a9f326 No need to force md code to define a macro that's the same as
_BYTE_ORDER. Use that instead.
2018-04-16 13:52:23 +00:00
Justin Hibbits
3877c32ec9 Use a resource hint instead of environment variable for DIU mode
This makes it more consistent with FreeBSD norms, rather than using Linux's
norms.  Now, instead of needing an environment variable

  video-mode=fslfb:1280x1024@60

Now one would use a hint:

  hint.fb.0.mode=1280x1024@60
2018-04-16 04:02:53 +00:00
Alexander Motin
bbbac409fe 9433 Fix ARC hit rate
When the compressed ARC feature was added in commit d3c2ae1
the method of reference counting in the ARC was modified.  As
part of this accounting change the arc_buf_add_ref() function
was removed entirely.

This would have be fine but the arc_buf_add_ref() function
served a second undocumented purpose of updating the ARC access
information when taking a hold on a dbuf.  Without this logic
in place a cached dbuf would not migrate its associated
arc_buf_hdr_t to the MFU list.  This would negatively impact
the ARC hit rate, particularly on systems with a small ARC.

This change reinstates the missing call to arc_access() from
dbuf_hold() by implementing a new arc_buf_access() function.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2018-04-16 00:54:58 +00:00
Brooks Davis
5aafa305af Remove device cm which was removed in r332490. 2018-04-15 15:06:07 +00:00
Warner Losh
5bc896bcc2 Make first a 'bool' instead of a 'boolean_t'.
'bool' is preferred to 'boolean_t'. We only get the boolean_t
definition by header pollution (though the same is true for
bool). Since we use both, switch entirely to bool.

Note: We still have TRUE/FALSE instead of true/false in heavy use in
the rest of the file. These are with ints of various flavors, so
that's appropriate, even though we should eventually migrate to bool
and true/false (though the tables they are in are nicely packed with
short and wouldn't be so nicely packed with bool, another reason
to leave it alone for now).
2018-04-14 22:14:18 +00:00
Navdeep Parhar
1131c927c4 cxgbe(4): Add support for Connection Offload Policy (aka COP).
COP allows fine-grained control on whether to offload a TCP connection
using t4_tom, and what settings to apply to a connection selected for
offload.  t4_tom must still be loaded and IFCAP_TOE must still be
enabled for full TCP offload to take place on an interface.  The
difference is that IFCAP_TOE used to be the only knob and would enable
TOE for all new connections on the inteface, but now the driver will
also consult the COP, if any, before offloading to the hardware TOE.

A policy is a plain text file with any number of rules, one per line.
Each rule has a "match" part consisting of a socket-type (L = listen,
A = active open, P = passive open, D = don't care) and a pcap-filter(7)
expression, and a "settings" part that specifies whether to offload the
connection or not and the parameters to use if so.  The general format
of a rule is: [socket-type] expr => settings

Example.  See cxgbetool(8) for more information.
[L] ip && port http => offload
[L] port 443 => !offload
[L] port ssh => offload
[P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno
[P] dst port ssh => offload !nagle ecn cong tahoe
[P] dst port http => offload
[A] dst port 443 => offload tls
[A] dst net 192.168/16 => offload !timestamp cong highspeed

The driver processes the rules for each new listen, active open, or
passive open and stops at the first match.  There is an implicit rule at
the end of every policy that prohibits offload when no rule in the
policy matches:
[D] all => !offload

This is a reworked and expanded version of a patch submitted by
Krishnamraju Eraparaju @ Chelsio.

Sponsored by:	Chelsio Communications
2018-04-14 19:07:56 +00:00
Konstantin Belousov
23084818ff Set PG_G global mapping bit on the trampoline ptes.
Trampoline mappings are better treated as global since they are valid
in all address spaces, even for PTI.  pmap_invalidate_range() must work
on global mappings for pti since kernel_pmap invalidations are really
same as for non-PTI.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D15052
2018-04-14 17:33:16 +00:00
Eitan Adler
a3d2e7b1ca sys: remove 'cm' from notes
Followup to r332490

MFC After:	never
PR:		182297
2018-04-14 08:05:42 +00:00
Conrad Meyer
f6e61711ed cpufreq: Remove error-prone table terminators in favor of automatic sizing
PR:		227388
Reported by:	Vladimir Machulsky <xdelta AT meta.ua>
Sponsored by:	Dell EMC Isilon
2018-04-14 03:15:05 +00:00
Brooks Davis
3a4fc8a8a1 Remove support for the Arcnet protocol.
While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.

Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.

PR:		182297
Reviewed by:	jhibbits, vangyzen
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15057
2018-04-13 21:18:04 +00:00
Konstantin Belousov
d86c1f0dc1 i386 4/4G split.
The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.

By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap.  The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.

There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.

copyout(9) was rewritten to use vm_fault_quick_hold().  An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls.  The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline.  If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.

The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done.  The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging.  I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.

Tested by: pho
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D14633
2018-04-13 20:30:49 +00:00
Brooks Davis
1315f9b59f Fix build on 32-bit systems. 2018-04-13 19:43:23 +00:00
Tycho Nightingale
6ac73777ea Add SDT probes to vmexit on Intel.
Submitted by:	domagoj.stolfa_gmail.com
Reviewed by:	grehan, tychon
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D14656
2018-04-13 17:23:05 +00:00
Warner Losh
c67f3c609b Just assert that the lock is held here, rather than taking it out and
dropping it.

Sponsored by: Netflix
2018-04-13 16:45:35 +00:00
Andrey V. Elsukov
56c989dff2 Add check that mbuf had not multicast layer2 address.
Such packets should be handled by ip6_mforward().

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2018-04-13 16:13:59 +00:00
Ruslan Bukin
b8f915ab24 Convert atse(4) driver for Altera Triple-Speed Ethernet MegaCore to use
xdma(4) interface.

This allows us to switch between Altera mSGDMA or SoftDMA engines used by
atse(4) device.

This also makes atse(4) driver become 25% smaller.

Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9618
2018-04-13 15:59:24 +00:00
Ruslan Bukin
36f5f2fb30 Add beripic1, msgdma and softdma instances.
Sponsored by:	DARPA, AFRL
2018-04-13 15:18:06 +00:00
Ruslan Bukin
8f89e7db08 Add driver for Altera SoftDMA® device.
SoftDMA is a software implementation of DMA engine built using Altera
FIFO component.

Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9620
2018-04-13 14:18:04 +00:00
Ram Kishore Vegesna
80b5058dcc Check if STACK is defined before using the stack(9).
PR: 227446
Reported by: emaste
Approved by: ken
2018-04-13 13:31:20 +00:00
Ruslan Bukin
4be5a951f6 Add driver for Altera modular Scatter-Gather DMA engine (mSGDMA).
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9619
2018-04-13 13:23:31 +00:00