Commit graph

138681 commits

Author SHA1 Message Date
Rick Macklem
690bcf605d param.h: Bump __FreeBSD_version for commit 18f5b477ee
Commit 18f5b477ee added two arguments to VOP_ALLOCATE().
As such, all NFS modules must be rebult from sources.

This is a direct commit.
2021-12-18 14:44:46 -08:00
Rick Macklem
18f5b477ee vfs: Add "ioflag" and "cred" arguments to VOP_ALLOCATE
When the NFSv4.2 server does a VOP_ALLOCATE(), it needs
the operation to be done for the RPC's credential and not
td_ucred. It also needs the writing to be done synchronously.

This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE()
and modifies vop_stdallocate() to use these arguments.

The VOP_ALLOCATE.9 man page will be patched separately.

(cherry picked from commit f0c9847a6c)
2021-12-18 14:30:25 -08:00
Vincenzo Maffione
56eeb84f10 em: skip rxcsum offload processing when disabled
Similarly to the other Intel drivers, don't try to process
RX checksum offloads when this feature (IFCAP_RXCSUM) is
disabled.

Reviewed by:	gallatin, kbowling, erj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33155

(cherry picked from commit d0633af765)
2021-12-18 12:00:25 +00:00
Vincenzo Maffione
a00d9c7f8c e1000: remove unused ifp backpointer
The ifp (struct ifnet) backpointer in the e1000 private ifnet
data is not used anymore since the iflib transition.
Remove it so that developers are not tempted to use it and
get a NULL pointer dereference.

Reviewed by:	markj, kbowling, erj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33157

(cherry picked from commit d91559564d)
2021-12-18 11:46:41 +00:00
Rick Macklem
f15f29c975 nfsd: Fix Verify for attributes like FilesAvail
When the Verify operation calls nfsv4_loadattr(), it provides
the "struct statfs" information that can be used for doing a
compare for FilesAvail, FilesFree, FilesTotal, SpaceAvail,
SpaceFree and SpaceTotal.  However, the code erroneously
used the "struct nfsstatfs *" argument that is NULL.
This patch fixes these cases to use the correct argument
structure.  For the case of FilesAvail, the code in
nfsv4_fillattr() was factored out into a separate function
called nfsv4_filesavail(), so that it can be called from
nfsv4_loadattr() as well as nfsv4_fillattr().

In fact, most of the code in nfsv4_filesavail() is old
OpenBSD code that does not build/run on FreeBSD, but I
left it in place, in case it is of some use someday.

I am not aware of any extant NFSv4 client that does Verify
on these attributes.

PR:	260176

(cherry picked from commit 2d90ef4714)
2021-12-17 17:36:41 -08:00
Alexander Motin
d87f1e2e36 Make msgbuf_peekbytes() not return leading zeroes.
Introduce new MSGBUF_WRAP flag, indicating that buffer has wrapped
at least once and does not keep zeroes from the last msgbuf_clear().
It allows msgbuf_peekbytes() to return only real data, not requiring
every consumer to trim the leading zeroes after doing pointless copy.
The most visible effect is that kern.msgbuf sysctl now always returns
proper zero-terminated string, not only after the first buffer wrap.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.

(cherry picked from commit 81dc00331d)
2021-12-17 20:36:23 -05:00
Rick Macklem
fac8422a41 nfsd: Sanity check the Layouttype count
PR:	260155

(cherry picked from commit 480be96e1e)
2021-12-17 17:32:47 -08:00
Ram Kishore Vegesna
de7809c1ee ocs_fc: Remove unused function delarations.
Remove unused function declarations.
Changes required for internal tool.

Approved by: ken

(cherry picked from commit 6f78736cb1)
2021-12-17 16:01:06 +05:30
Ram Kishore Vegesna
161148e61f ocs_fc: Fix device lost timer where device is not getting deleted.
Issue: Devices wont go away after the link down.

Device lost timer functionality in ocs_fc is broken,
`is_target` flag is not set in the target database and target delete is skipped.

Fix: Remove unused flags and delete the device when timer expires.

Reported by: ken@kdm.org
Reviewed by: mav, ken

(cherry picked from commit 41e9466943)
2021-12-17 16:01:06 +05:30
Ram Kishore Vegesna
0fff02894f ocs_fc: When commands complete with an error, freeze the device queue.
Proper error recovery depends on freezing the device queue when an
error occurs, so we can recover from an error before sending
additional commands.

The ocs_fc(4) driver was not freezing the device queue for most
SCSI errors, and that broke error recovery.

sys/dev/ocs_fc/ocs_cam.c:
	In ocs_scsi_initiator_io_cb(), freeze the device queue if
        we're passing back status other than CAM_REQ_CMP.

Submitted by: ken@kdm.org
Reviewed by: mav, ken

(cherry picked from commit d063d1bc92)
2021-12-17 16:01:06 +05:30
Ram Kishore Vegesna
4821da88ab ocs_fc: Fix CAM status reporting in ocs_fc(4) when no data is returned.
In ocs_scsi_initiator_io_cb(), if the SCSI command that is
        getting completed had a residual equal to the transfer length,
        it was setting the CCB status to CAM_REQ_CMP.

        That breaks the expected behavior for commands like READ ATTRIBUTE.
        For READ ATTRIBUTE, if the first attribute requested doesn't exist,
        the command is supposed to return an error (Illegal Request,
        Invalid Field in CDB).  The broken behavior for READ ATTRIBUTE
        caused LTFS tape formatting to fail.  It looks for attribute
        0x1623, and expects to see an error if the attribute isn't present.

        In addition, if the residual is negative (indicating an overrun),
        only set the CCB status to CAM_DATA_RUN_ERR if we have not already
        reported an error.  The SCSI sense data will have more detail about
        what went wrong.

        sys/dev/ocs_fc/ocs_cam.c:
                In ocs_scsi_initiator_io_cb(), don't set the status to
                CAM_REQ_CMP if the residual is equal to the transfer length.

                Also, only set CAM_DATA_RUN_ERR if we didn't get SCSI
                status.

Submitted by: ken@kdm.org
Reviewed by: mav, ken

(cherry picked from commit 1af49c2eeb)
2021-12-17 16:01:05 +05:30
Ram Kishore Vegesna
661875cecb ocs_fc: Increase maximum supported SG elements to support larger transfer sizes.
Reported by: ken@kdm.org
Reviewed by: mav, ken

(cherry picked from commit 322dbb8ce8)
2021-12-17 16:01:05 +05:30
Ram Kishore Vegesna
ccd32c6513 ocs_fc: Emulex Gen 7 HBA support.
Emulex Gen7 adapter support in ocs_fc driver.

Reviewed by: mav, ken

(cherry picked from commit 3bf42363b0)
2021-12-17 16:01:05 +05:30
Ram Kishore Vegesna
44a78c21df ocs_fc: Add gendump and dump_to_host ioctl command support.
Support to generate firmware dump.

Approved by: mav(mentor)

(cherry picked from commit 29e2dbd42c)

Add ocs_gendump.c to the build, missed in 29e2dbd42c.

(cherry picked from commit d0732fa819)
2021-12-17 16:00:59 +05:30
Ram Kishore Vegesna
5749a57326 ocs_fc: Fix use after free bug in ocs_hw_async_call()
Freed ctx is used in the later callee ocs_hw_command(),
which is a use after free bug.

Return error if sli_cmd_common_nop() failed.

PR: 255865
Reported by: lylgood@foxmail.com
Approved by:: markj

(cherry picked from commit 7377d3831b)
2021-12-17 15:42:25 +05:30
Ram Kishore Vegesna
f7a7748afc ocs_fc: Fix a use after free in ocs_sport_free
Domain which could be freed is used while freeing the sport.
Use ocs from sport.

PR: 255866
Reported by: lylgood@foxmail.com
Approved by:: markj

(cherry picked from commit dd722ccd6e)
2021-12-17 15:42:09 +05:30
Ram Kishore Vegesna
92d579a9ae ocs_fc: Fix memory leak in ocs_scsi_io_alloc()
PR: 254690
Approved by: mav(mentor)
MFC after: 2 weeks

(cherry picked from commit fc620f9782)
2021-12-17 15:41:49 +05:30
Andriy Gapon
6820d578bb twsi: support more message combinations in transfers
Most prominently, add support for a transfer where a write with no-stop
flag is followed by a write with no-start flag.  Logically, it's a
single larger write, but consumers may want to split it like that
because one part can be a register ID and the other part can be data to
be written to (or starting at) that register.

Such a transfer can be created by i2c tool and iic(4) driver, e.g., for
an EEPROM write at specific offset:
    i2c -m tr -a 0x50 -d w -w 16 -o 0 -c 8 -v < /dev/random

This should be fixed by new code that handles the end of data transfer
for both reads and writes.  It handles two existing conditions and one
new.  Namely:
- the last message has been completed -- end of transfer;
- a message has been completed and the next one requires the start
  condition;
- a message has been completed and the next one should be sent without
  the start condition.

In the last case we simply switch to the next message and start sending
its data.  Reads without the start condition are not supported yet,
though.  That's because we NACK the last byte of the previous message,
so the device stops sending data.  To fix this we will need to add a
look-ahead at the next message when handling the penultimate byte of the
current one.

This change also fixed a bug where msg_idx was not incremented after a
read message.  Apparently, typically a read message is a last message in
a transfer, so the bug did not cause much trouble.

PR:		258994

(cherry picked from commit ff1e858180)
2021-12-17 09:30:58 +02:00
Andriy Gapon
86af87acd1 twsi: make data receiving code safer
Assert that we are not receiving data beyond the requested length.
Assert that we have not NACK-ed incoming data prematurely.
Abort the current transfer if the incoming data is NACK-ed or not
NACK-ed unexpectedly.

Add debug logging of received data to complement logging of sent data.

(cherry picked from commit 00c07d9559)
2021-12-17 09:30:54 +02:00
Andriy Gapon
74f175202c twsi: remove redundant write of control register
The write at the end of twsi_intr() already handles all cases, no need
to have another write for TWSI_STATUS_START / TWSI_STATUS_RPTD_START.

(cherry picked from commit aeacf172fd)
2021-12-17 09:30:48 +02:00
Andriy Gapon
c7bf34cfe5 twsi: move handling of TWSI_CONTROL_ACK into the state machine
Previously the code set TWSI_CONTROL_ACK in twsi_transfer() based on
whether the first message had a length of one.  That was done regardless
of whether the message was a read or write and what kind of messages
followed it.
Now the bit is set or cleared while handling TWSI_STATUS_ADDR_R_ACK
state transition based on the current (read) message.

The old code did not correctly work in a scenario where a single byte
was read from an EEPROM device with two byte addressing.
For example:
    i2c -m tr -a 0x50 -d r -w 16 -o 0 -c 1 -v
The reason is that the first message (a write) has two bytes, so
TWSI_CONTROL_ACK was set and never cleared.
Since the controller did not send NACK the EEPROM sent more data resulting
in a buffer overrun.

While working on TWSI_STATUS_ADDR_R_ACK I also added support for
the zero-length read access and then I did the same for zero-length write
access.
While rare, those types of I2C transactions are completely valid and are
used by some devices.

PR:		258994

(cherry picked from commit 04622a7f21)
2021-12-17 09:30:41 +02:00
Andriy Gapon
9c0050b0a6 kern_tc: unify timecounter to bintime delta conversion
There are two places where we convert from a timecounter delta to
a bintime delta: tc_windup and bintime_off.
Both functions use the same calculations when the timecounter delta is
small.  But for a large delta (greater than approximately an equivalent
of 1 second) the calculations were different.  Both functions use
approximate calculations based on th_scale that avoid division.  Both
produce values slightly greater than a true value, calculated with
division by tc_frequency, would be.  tc_windup is slightly more
accurate, so its result is closer to the true value and, thus, smaller
than bintime_off result.

As a consequence there can be a jump back in time when time hands are
switched after a long period of time (a large delta).  Just before the
switch the time would be calculated with a large delta from
th_offset_count in bintime_off.  tc_windup does the switch using its own
calculations of a new th_offset using the large delta.  As explained
earlier, the new th_offset may end up being less than the previously
produced binuptime.  So, for a period of time new binuptime values may
be "back in time" comparing to values just before the switch.

Such a jump must never happen.  All the code assumes that the uptime is
monotonically nondecreasing and some code works incorrectly when that
assumption is broken.  For example, we have observed sleepq_timeout()
ignoring a timeout when the sbinuptime value obtained by the callout
code was greater than the expiration value, but the sbinuptime obtained
in sleepq_timeout() was less than it.  In that case the target thread
would never get woken up.

The unified calculations should ensure the monotonic property of the
uptime.

The problem is quite rare as normally tc_windup should be called HZ
times per second (typically 1000 or 100).  But it may happen in VMs on
very busy hypervisors where a VM's virtual CPU may not get an execution
time slot for a second or more.

Reviewed by:	kib
Sponsored by:	Panzura LLC

(cherry picked from commit 3d9d64aa18)
2021-12-17 09:28:24 +02:00
Alexander Motin
b7da472979 APEI: Improve multiple error sources handling.
Some AMD systems I have report 8 NMI and 3591 polled error sources.
Previous code could handle only one NMI source and used separate
callout for each polled source.  New code can handle multiple NMIs
and groups polled sources by power of 2 of the polling period.

MFC after:	2 weeks

(cherry picked from commit 2dfc1f7355)
2021-12-15 21:32:36 -05:00
Mark Johnston
55e020a6f9 amd64: Reduce the amount of cpuset copying done for TLB shootdowns
We use pmap_invalidate_cpu_mask() to get the set of active CPUs.  This
(32-byte) set is copied by value through multiple frames until we get to
smp_targeted_tlb_shootdown(), where it is copied yet again.

Avoid this copying by having smp_targeted_tlb_shootdown() make a local
copy of the active CPUs for the pmap, and drop the cpuset parameter,
simplifying callers.  Also leverage the use of the non-destructive
CPU_FOREACH_ISSET to avoid unneeded copying within
smp_targeted_tlb_shootdown().

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit ab12e8db29)
2021-12-15 08:31:48 -05:00
Kornel Duleba
77b3cf989f pci: Don't try to read cfg registers of non-existing devices
Instead of returning 0xffs some controllers, such as Layerscape generate
an external exception when someone attempts to read any register
of config space of a non-existing device other than PCIR_VENDOR.
This causes a kernel panic.
Fix it by bailing during device enumeration if a device vendor register
returns invalid value. (0xffff)
Use this opportunity to replace some hardcoded values with a macro.

I believe that this change won't have any unintended side-effects since
it is safe to assume that vendor == 0xffff -> hdr_type == 0xffff.

Sponsored by:		Alstom
Obtained from:		Semihalf
Reviewed by:		jhb
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D33059

(cherry picked from commit 68cbe189fd)
2021-12-15 11:46:46 +01:00
Alexander Motin
98a6200ee2 mca: Switch to using taskqueue_enqueue_timeout_sbt().
Previously it was not allowed on fast taskqueues.  It was fixed in
4730a8972b.  This should make no functional change, just a bit
cleaner and efficient code.

MFC after:	1 week

(cherry picked from commit 9a128e1678)
2021-12-14 23:00:17 -05:00
Alexander Motin
550ccfd8f6 mca: Decode new Intel status bits.
MFC after:	1 week

(cherry picked from commit 3bdba24c74)
2021-12-14 23:00:17 -05:00
Alexander Motin
218457166f mca: Remove excessively verbose debug messages.
Expecially in case of AMD there was more than dozen lines per CPU.

MFC after:	1 week

(cherry picked from commit 935dc0de88)
2021-12-14 23:00:17 -05:00
Alexander Motin
00c069c68c mca: Make some sysctls also a loader tunables.
MFC after:	1 week

(cherry picked from commit c2003f2684)
2021-12-14 23:00:17 -05:00
Konstantin Belousov
4305fd126c Kernel linkers: add emergency sysctl to restore old behavior
PR:	207898

(cherry picked from commit ecd8245e0d)
2021-12-15 03:41:29 +02:00
Konstantin Belousov
da536d64b7 kernel linker: do not read debug symbol tables for non-debug symbols
PR:	207898

(cherry picked from commit 95c20faf11)
2021-12-15 03:41:29 +02:00
Konstantin Belousov
b23c24558b linker_debug_symbol_values(): use proper linker interface to get debug values
(cherry picked from commit 72f6662662)
2021-12-15 03:41:29 +02:00
Konstantin Belousov
ac7f6d2400 x86: add a comment providing source for numbers in legacy XSAVE area layout
(cherry picked from commit 0e6b06d5c8)
2021-12-15 03:41:29 +02:00
Konstantin Belousov
1d6ebddb62 amd64: correct size of the SSE area in the xsave layout
(cherry picked from commit 73b357be92)
2021-12-15 03:41:28 +02:00
Rick Macklem
6a6b08d464 nfsd: Do not try to cache a reply for NFSERR_BADSLOT
When nfsrv_checksequence() replies NFSERR_BADSLOT,
the value of nd_slotid is not valid.  As such, the
reply cannot be cached in the session.
Do not set ND_HASSEQUENCE for this case.

PR:	260076

(cherry picked from commit 33d0be8a92)
2021-12-14 17:34:05 -08:00
Rick Macklem
081d40b8ea nfsd: Sanity check the ACL attribute
When an ACL is presented to the NFSv4 server in
Setattr or Verify, parsing of the ACL assumed a
sane acecnt and sane sizes for the "who" strings.
This patch adds sanity checks for these.

The patch also fixes handling of an error
return from nfsrv_dissectacl() for one broken
case.

PR:	260111

(cherry picked from commit fd020f197d)
2021-12-14 17:32:27 -08:00
Alan Somers
0bade34633 fusefs: update atime on reads when using cached attributes
When using cached attributes, whether or not the data cache is enabled,
fusefs must update a file's atime whenever it reads from it, so long as
it wasn't mounted with -o noatime.  Update it in-kernel, and flush it to
the server on close or during the next setattr operation.

The downside is that close() will now frequently trigger a FUSE_SETATTR
upcall.  But if you care about performance, you should be using
-o noatime anyway.

Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D33145

(cherry picked from commit 91972cfcdd)

fusefs: fix 32-bit build of the tests after 91972cfcdd

(cherry picked from commit d109559ddb)
2021-12-14 15:15:53 -07:00
Alan Somers
000ce6dee1 fusefs: fix copy_file_range when extending a file
When copy_file_range extends a file, it must update the cached file
size.

Reviewed by:	rmacklem, pfg
Differential Revision: https://reviews.freebsd.org/D33151

(cherry picked from commit 65d70b3bae)
2021-12-14 14:49:56 -07:00
Alan Somers
a28611cfc8 fusefs: delete a redundant getnanouptime
It's been redundant since SVN r346060 added another getnanouptime just
above.

(cherry picked from commit 8fbae6c7bd)
2021-12-14 14:49:18 -07:00
Eugene Grosbein
0b12cc411b if_epair: MFC: fix module build outside of kernel build environment
(cherry picked from commit 7a382e744b)
2021-12-14 20:22:25 +07:00
Andrew Turner
e50a0a8c44 Remove redundant declarations
These are already defined in the same file.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit dd978721a2)
2021-12-14 10:58:01 +00:00
Andrew Turner
cf63f12a51 Move the arm64 identify_cpu SYSINIT earlier
It is used by late ifunc resolvers so needs to be at an earlier stage
of the boot. Previously it was at the same stage so may not have run
before the ifunc resolvers.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 44ca369051)
2021-12-14 10:58:01 +00:00
Andrew Turner
d7d55d9f75 Move KHELP_DECLARE_MOD_UMA later in the boot
Both KHELP_DECLARE_MOD_UMA and the kernel linker SYSINIT to find
in-kernel modules run at SI_SUB_KLD, SI_ORDER_ANY. As the former
depends on the latter running first move it later in the boot,
to the new SI_SUB_KHELP. This ensures KHELP_DECLARE_MOD_UMA
module SYSINIT functions will be after the kernel linker.

Previously we may have received a panic similar to the following if
the order was incorrect:

panic: module_register_init: module named ertt not found

Reported by:	bob prohaska <fbsd AT www.zefox.net>
Discussed with:	imp, jhb
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit ae062ff269)
2021-12-14 10:58:01 +00:00
Andrew Turner
9b87a3a65a Print the correct register for the arm64 elr
In 7ec86b6609 ("Also print symbols when printing arm64 registers")
a new function was created to print most registers. Unfortunately the
Link Register (LR) was being printed when we should have printed the
Exception Link Register (ELR).

Fix this by adding the missing 'e'.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 62cbc00d2f)
2021-12-14 10:58:01 +00:00
Andrew Turner
aa3b5d79b2 Pass the ACPI ID when reading the ACPI domain
The ACPI ID may not be the same as the FreeBSD CPU id. Use the former
when finding the CPU domain as there is no requirement for it to be
identical to the latter.

Reported by:	dch, kevans
Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32546

(cherry picked from commit 4fb002805e)
2021-12-14 10:58:01 +00:00
Andrew Turner
b7c23efd74 Stop reading the arm64 domain when it's known
There is no need to read the domain on arm64 when there is only one
in the ACPI tables. This can also happen when the table is missing
as it is unneeded.

Reported by:	dch
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 0906563718)
2021-12-14 10:58:01 +00:00
Andrew Turner
f51997c6e4 Allocate arm64 per-CPU data in the correct domain
To minimise NUMA traffic allocate the pcpu, dpcpu, and boot stacks in
the correct domain when possible.

Submitted by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32338

(cherry picked from commit a90ebeb5fe)
2021-12-14 10:58:01 +00:00
Doug Moore
0848451a2e Set uninitialized popmap bits in vm_reserv_init
In vm_reserv_init, set all the marker popmap bits in vm_reserv_init,
and not just the bits of the first popmap entry.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33258

(cherry picked from commit 9f32cb5b1c)
2021-12-13 23:09:13 -06:00
Cy Schubert
d3a7b7168c ip_log: remove set-but-not-unused vars
(cherry picked from commit 664882ab16)
2021-12-13 17:13:05 -08:00
Konstantin Belousov
bd914cea9e amd64: Only build aout.ko when COMPAT_FREEBSD32 is enabled
(cherry picked from commit 0f2d88d1eb)
2021-12-14 02:44:01 +02:00