Most all of the memory used by the cards in the mpr(4) and mps(4)
drivers is required, according to the specs and Broadcom developers,
to be within a 4GB segment of memory.
This includes:
System Request Message Frames pool
Reply Free Queues pool
ReplyDescriptorPost Queues pool
Chain Segments pool
Sense Buffers pool
SystemReply message pool
We got a bug report from Dwight Engen, who ran into data corruption
in the BAE port of FreeBSD:
> We have a port of the FreeBSD mpr driver to our kernel and recently
> I found an issue under heavy load where a DMA may go to the wrong
> address. The test system is a Supermicro X10SRH-CLN4F with the
> onboard SAS3008 controller setup with 2 enterprise Micron SSDs in
> RAID 0 (striped). I have debugged the issue and narrowed down that
> the errant DMA is one that has a segment that crosses a 4GB
> physical boundary. There are more details I can provide if you'd
> like, but with the attached patch in place I can no longer
> re-create the issue.
> I'm not sure if this is a known limit of the card (have not found a
> datasheet/programming docs for the chip) or our system is just
> doing something a bit different. Any helpful info or insight would
> be welcome.
> Anyway, just thought this might be helpful info if you want to
> apply a similar fix to FreeBSD. You can ignore/discard the commit
> message as it is my internal commit (blkio is our own tool we use
> to write/read every block of a device with CRC verification which
> is how I found the problem).
The commit message was:
> [PATCH 8/9] mpr: fix memory corrupting DMA when sg segment crosses
> 4GB boundary
> Test case was two SSD's in RAID 0 (stripe). The logical disk was
> then partitioned into two partitions. One partition had lots of
> filesystem I/O and the other was initially filled using blkio with
> CRCable data and then read back with blkio CRC verify in a loop.
> Eventually blkio would report a bad CRC block because the physical
> page being read-ahead into didn't contain the right data. If the
> physical address in the arq/segs was for example 0x500003000 the
> data would actually be DMAed to 0x400003000.
The original patch was against mpr(4) before busdma templates were
introduced, and only affected the buffer pool (sc->buffer_dmat) in
the mpr(4) driver. After some discussion with Dwight and the
LSI/Broadcom developers and looking through the driver, it looks
like most of the queues in the driver are ok, because they limit
the memory used to memory below 4GB. The buffer queue and the chain
frames seem to be the exceptions.
This is pretty much the same between the mpr(4) and mps(4) drivers.
So, apply a 4GB boundary limitation for the buffer and chain frame pools
in the mpr(4) and mps(4) drivers.
Reported by: Dwight Engen <dwight.engen@gmail.com>
Reviewed by: imp
Obtained from: Dwight Engen <dwight.engen@gmail.com>
Differential Revision: <https://reviews.freebsd.org/D43008>
Unlike bwi(4), bwn(4) does not rely on ic_headroom (despite having it
set) but splits the bwn_txhdr (first) segment into its own transaction.
Remove ic_headroom to avoid net80211 troubles with not enough space in
the mbuf around ieee80211_mbuf_adjust().
PR: 275616
MFC after: 3 days
Hardware timeout uses a 8-bit timeout value and expects the timeout to
be less than 255 seconds. Added software timer implemetation to timeout
and abort the IOs with timeout more than 255 seconds.
Fix the timeout problem by dividing CAM timeouts by 1000 as hardware
expects timeout value in seconds. Before this change, CAM timeouts in
milliseconds were getting truncated to 8 bits and converted to seconds.
So the actual timeout used when going down to the card would depend on
the bottom 8 bits of the timeout used.
Add the mapping of ocs_fc error status to CAM status.
Reported by: ken
Reviewed by: ken
Tested by: ken, ram
Approved by: ken
MFC after: 1 week
Some Raspberry Pi pass smsc95xx.macaddr=XX:XX:XX:XX:XX:XX as bootargs.
Use this if no ethernet address is found in an EEPROM.
As last resort fall back to ether_gen_addr() instead of random MAC.
PR: 274092
Reported by: Patrick M. Hausen (via ML)
Reviewed by: imp, karels, zlei
Tested by: Patrick M. Hausen
Approved by: karels
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D42463
Without filter functions, we do not need to keep track of tag ancestry.
All inheritance of the parent tag's parameters occurs when creating the
new child tag.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42895
Address filter functions are unused, unsupported, and now rejected.
Simplify some busdma code by removing filter functionality completely.
Note that the chains of parent tags become useless, and will be cleaned
up in the next commit.
No functional change intended.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42894
struct siba_bhndb_softc embeds struct siba_softc and adds an extra
field, "quirks". In practice, this bug was harmless since "quirks" is
unconditionally initialized during driver attach and would have lived in
the redzone of the softc allocation, but KASAN catches the out-of-bounds
access.
PR: 275515
Reported by: Frank Hilgendorf <frank.hilgendorf@posteo.de>
MFC after: 1 week
Previously we were trying to set hca_cap_2 without checking if
sw_vhca_id_valid max value, which is the only settable value inside
hca_cap_2, and seeing that we dont have driver support for sw_vhca_id
yet there is no need to set hca_cap_2 at all, it is enough to query it.
Fixes: 7b959396ca ("mlx5: Introduce new destination type TABLE_TYPE")
MFC after: 3 days
We have the mechanism in place to support encoding system registers
explicitly, so use that rather than requiring LLVM 13+, which breaks our
current set of GitHub CI builds.
Fixes: 9eecef0521 ("Add an Armv8 rndr random number provider")
Some of the I2C ioctl request structures contain pointers and need to
handle requests from 32-bit applications on 64-bit kernels.
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D42836
This reverts commit b65f813c1a.
As a side effect this also seems to fix wtap which seems to have
lost the epoch over the input path in between.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Just like it was done for accept(2) in cfb1e92912, use same approach
for two simplier syscalls that return socket addresses. Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.
Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D42694
Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.
While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient. Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do. Linux also does that. Update tests accordingly.
Reviewed by: rscheff, tuexen, zlei, dchagin
Differential Revision: https://reviews.freebsd.org/D42635
Some of the SMB ioctl request structures contain pointers and need to
handle requests from 32-bit applications on 64-bit kernels.
Obtained from: Juniper Networks, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D42837
[Why]
For instance, it gives a chance to the new backend to refresh the
screen. This is needed by the vt_drmfb backend and `drm_fb_helper`.
This change was lost when I posted changes to reviews.freebsd.org and it
broken the amdgpu driver... Thanks to manu@ for reporting the problem
and wulf@ to find out the missing change!
Tested by: manu
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D42834
In particular, this enables support for PCI config access for domains
(segments) other than 0.
Reported by: cperciva
Tested by: cperciva (m7i.metal-48xl AWS instance)
Reviewed by: imp
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D42828
This commit changes the API of pci_cfgreg(read|write) to add a domain
argument (referred to as a segment in ACPI parlance) (note that this
is not the same as a NUMA domain, but something PCI-specific). This
does not yet enable access to domains other than 0, but updates the
API to support domains.
Places that use hard-coded bus/slot/function addresses have been
updated to hardcode a domain of 0. A few places that have the PCI
domain (segment) available such as the acpi_pcib_acpi.c Host-PCI
bridge driver pass the PCI domain.
The hpt27xx(4) and hptnr(4) drivers fail to attach to a device not on
domain 0 since they provide APIs to their binary blobs that only
permit bus/slot/function addressing.
The x86 non-ACPI PCI bus drivers all hardcode a domain of 0 as they do
not support multiple domains.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D42827
No sense having a variable for this. So use BUS_SPACE_MAXADDR and remove
dma_hiaddr from softc.
Suggested by: jhb
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42808
Replace the test for DataLength == 0 with an assert. It can't happen,
but an assert doesn't hurt. Emacs removed some trailing white space too.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42807
Use the simpler template code for the parent busdma tag for all I/O to
this card.
Reviewed by: mav, jhb, imp
Differential Revision: https://reviews.freebsd.org/D42607
These calls "should" all be synchrounous. There's no bouncing that's
needed for them (at least in the typical case that we have a sane card
that has more bits of dma addresses decoded than we have memory), so
there's no errors possible. Ensure these calls are really synchronous
with BUS_DMA_NOWAIT flags (which should never fail now that the
bus_dmamem_alloc() has succeeded).
Reviewed by: mav, jhb, imp
Differential Revision: https://reviews.freebsd.org/D42606
This usage is obsolete. Replace with maximum bus space size. maxphys
will sort itself out at higher levels.
Reviewed by: mav, jhb, imp
Differential Revision: https://reviews.freebsd.org/D42605
Publish the firmware version on the card like we do for mps/mpr.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D42588
The number of signficant bits that are decoded are returned in the flags
field of the IOCFacts structure from the device. Rather than assume the
worst with a pessimal 32-bit maximum, look at this value and pass it
along to all the dma map creation requests.
A lof of those creations are repetitive and could just inherit from the
base tag if we moved to the templated interface. This is called out as
desireable future work not done at this time.
In addition, due to a chicken and an egg problem, we have to allocate
some of the maps with a 32-bit loaddr. These are the ones we need to
read iocfacts. And they are fine to be so restricted: they are little
used after startup, and when they are used, bouncing is fine.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D42559
Move enqueueing of commands to bus_dmamap_load_ccb callback
Fix fundamental difference between FreeBSD and Linux. On Linux, your dma
load callback always happends before it returns, so drivers are written
to load the map, then submit to hardware. On FreeBSD, the callback may
be deferred and return EINPROGRESS. This means the callback is
responsible for queueing the request to the hardware is done after the
SGL list is created. Make a number of interrelated cahnages:
At the end of mpi3mr_prepare_sgls, add a call to mpi3mr_enqueue_request.
Split the hardware submission out from the end of mpi3mr_action_scsiio
and move it into a new routine mpi3mr_enqueue_request.
Move all error completion from the end of mpi3mr_action_scsiio to where
the error is detected. We cannot pass errors back from the
mpi3mr_enqueue_request to do this on a 'failed' mpi3mr in a centralized
place (since it has to be fire and forget).
Add comments about zero length SGLs never making it into
mpi3mr_prepare_sgls. Keep the code there for the moment, but we only set
cm->data to non-NULL when scsiio_req->DataLength is not zero. So the
datalength can't be zero and we can't send the zero SGLs.
Add commentts about other "impossible" tests in mpi3mr_prepare_sgls that
really should be simple asserts of some flavor.
Eliminate cm->error_code, since we can't pass data back from the
mpi3mr_prepare_sgl callback anymore.
In mpi3mr_map_request, call mpi3mr_enqueue_request for the no data case.
This seems to work even though we've not done the special zero length
handling that was in mpi3mr_prepare_sgls, giving further evidence to it
not actually being needed. This is needed for SCSI CDBs that have no
data to pass to the drive like TEST UNIT READY.
With this change, and the prior ones, we're now able to run with mpi3mr
on 128GB systems and very heavy disk load (so many buffers land > 4GB:
the driver instructs busdma to never use memory abouve 4GB, which may be
too conservative, but an issue for another time).
Sponsored by: Netflix
Reviewed by: sumit.saxena_broadcom.com, mav, jhb
Differential Revision: https://reviews.freebsd.org/D42543
More uniformly use mpi3mr_set_ccbstatus in mpi3mr_action_scsiio. The
routine mostly used it, but also has setting of status by hand. In those
cases where we want to error out the request, use this routine.
As part of this, move setting CAM_SIM_QUEUED later in the function to
when we're sure it's been queued. Remove the places we clear it before
this.
Sponsored by: Netflix
Reviewed by: mav, jhb
Differential Revision: https://reviews.freebsd.org/D42542
Since we assume there's a timeout to cancel when this is true, only set
it true when we set the timeout. Otherwise we may try to cancel a timeout
when there's been an error in submission.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D42541
Reduce the scope of reset_mutext to protect the msleep in the watch dog
thread as well as the MPI3MR_FLAGS_SHUTDOWN bit. Use it to protect the
wakeup in mpi3mr_detach so this thread can exit sooner when we're trying
to do an orderly shutdown. Optimize the flow to check the sleep and
other conditions before going to sleep.
It's an open question if this should protect sc->unrecoverable, and if
we should wakeup the watchdog thread when we set it. We might also want
to move too booleans for the three flags that we have now in
mpi3mr_flags. There are a number of U8s that should really be bools and
we might want to also group them together to pack softc better.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D42539
All of these fields are either unused, or just initialized. Remove
them. This saves about 1MB of memory for the cards that I have which can
do 8k transactions at once.
Sponsored by: Netflix
Reviewed by: mav, jhb
Differential Revision: https://reviews.freebsd.org/D42538
Holding fwevt_lock when we call taskqueue_drain can lead to deadlock
because it's draining a queue needs fwevt_lock to do work, so that other
thread will try to take out the lock and block, making the thread never
finish and taskqueue_drain never complete. There's a witness
warning/error for this which was exposed when the lock was converted to
a MTX_DEF lock from a MTX_SPIN prior to committing to the FreeBSD tree.
The lock appears to be to protect against additional items being added
to the event list while we're doing a reset. Since the taskqueue is
blocked, items can get added to the list, but won't be processed during
the reset, but there is still a (likely small) race between the
taskqueue_drain and the taskqueue_block calls where an interrupt could
fire on another CPU, resulting in a task being enqueued and started
before the block can take effect. The only way to fix that race is to
turn off interrupt processing during a reset. So we replace a deadlock
with a smaller race.
Sponsored by: Netflix
Reviewed by: sumit.saxena_broadcom.com, mav, jhb
Differential Revision: https://reviews.freebsd.org/D42537
The driver argument is most certainly now used by these functions. When
originally implemented it might have been unused, but not now.
Reviewed by: royger
These were needed in the past, since that time the interrupt code has
been successfully isolated from the Xen/PCI code. As such a bit of
straightforward cleanup.
Differential Revision: https://reviews.freebsd.org/D32923
Reviewed by: royger
Fix a few spots where handle pointers were incorrectly used. Luckily
these appear rarely triggered given how long they've been lurking.
Fixes: 76acc41fb7 ("Implement vector callback for PVHVM and unify event channel implementations")
Fixes: 9f40021f28 ("Introduce a new, HVM compatible, paravirtualized timer driver for Xen.")
MFC after: 2 weeks
Reviewed by: royger
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix