The pidx argument of isc_rxd_flush() indicates which is the last valid
receive descriptor to be used by the NIC. However, current code has
multiple issues:
- Intel drivers write pidx to their RDT register, which means that
NICs will only use the descriptors up to pidx-1 (modulo ring size N),
and won't actually use the one pointed by pidx. This does not break
reception, but it is anyway confusing and suboptimal (the NIC will
actually see only N-2 descriptors as available, rather than N-1).
Other drivers (if_vmx, if_bnxt, if_mgb) adhere to this semantic).
- The semantic used by Intel (RDT is one descriptor past the last
valid one) is used by most (if not all) NICs, and it is also used
on the TX side (also in iflib). Since iflib is not currently
using this semantic for RX, it must decrement fl->ifl_pidx
(modulo N) before calling isc_rxd_flush(), and then the
per-driver callback implementation must increment the index
again (to match the real semantic). This is confusing and suboptimal.
- The iflib refill function is also called at initialization.
However, in case the ring size is smaller than 128 (e.g. if_mgb),
the refill function will actually prepare all the receive
descriptors (N), without leaving one unused, as most of NICs assume
(e.g. to avoid RDT to overrun RDH). I can speculate that the code
looks like this right now because this issue showed up during
testing (e.g. with if_mgb), and it was easy to workaround by
decrementing pidx before isc_rxd_flush().
The goal of this change is to simplify the code (removing a bunch
of instructions from the RX fast path), and to make the semantic of
isc_rxd_flush() consistent across drivers. To achieve this, we:
- change the semantics of the pidx argument to the usual one (that
is the index one past the last valid one), so that both iflib and
drivers avoid the decrement/increment dance.
- fix the initialization code to prepare at most N-1 descriptors.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26191
Previously we were relying on ether_ifattach() to set if_mtu, but
max_frame_size is initialized earlier. This fixes a regression
introduced by r250375.
PR: 249050
Submitted by: Christian Vallières <novacrash_@hotmail.com>
MFC after: 3 days
Implement support for an eSDHC controller found in NXP QorIQ Layerscape SoCs.
This driver has been tested with NXP LS1046A and LX2160A (Honeycomb board),
which is incompatible with the existing sdhci_fsl driver (aiming at older
chips from this family). As such, it is not intended as replacement for
the old driver, but rather serves as an improved alternative for SoCs that
support it.
It comes with support for both PIO and Single DMA modes and samples the
clock from the extres clk API.
Submitted by: Artur Rojek <ar@semihalf.com>
Reviewed by: manu, mmel, kibab
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D26153
When enabling the MMU on arm64 we need to ensure the tlb invalidation has
completed before setting the enable bit in the SCTLR register.
Reported by: alc
Sponsored by: Innovate UK
Previously the send tag was setup in the background, and all packets for
the given send tag were dropped until ready. Change this to be blocking
behaviour so that once the setsocketopt() for enabling TLS completes,
the socket is ready to send packets. Do this by simply flushing the
work request which does the needed firmware programming during send
tag allocation.
MFC after: 1 week
Sponsored by: Mellanox Technologies // Nvidia
PMCLOG macros were always using 32-bit addresses, even on PPC64.
This resulted in truncated addresses in logs, when running on 64-bit PPC
machines.
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26112
Calling pmc_hook inside a critical section may result in a panic.
This happens when the user callchain is fetched, because it uses
pmap_map_user_ptr, that tries to get the (sleepable) pmap lock when the
needed vsid is not found.
Judging by the implementation in other platforms, intr_irq_handler in
kern/subr_intr.c and what pmc_hook do, it seems safe to move pmc_hook
outside the critical section.
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26111
Add support for stage 2 pmap to pmap_pte_dirty, pmap_release, and more
of pmap_enter. This adds support in all placess I have hit while testing
bhyve ehile faulting pages in as needed.
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D26065
The "Intel Sunrise Point-LP USB 3.0 controller" doesn't update the wMaxPacket
field in the control endpoint context automatically causing a BABBLE error code
on the initial first USB device descriptor read, when the bMaxPacketSize is not
8 bytes.
Reported by: wulf@
PR: 248784
MFC after: 1 week
Sponsored by: Mellanox Technologies
Save 7k of text space by using simpler crc32 for standalone case. we
don't need all that fancy optimization in the boot loader, so use a
simplified version of the CRC function. We could save more by doing it
one bit at a time rather than 32, but this is the biggest savings at
the smallest performance hit.
With LUA and verfied exec, gptboot, gptzfsboot and friends are pushing
the ~530k limit and every little bit helps.
Reviewed By: allanjude
Differential Revision: https://reviews.freebsd.org/D24225
When SMP support for powerpc was added in r178628, the last callers of this
function were removed. All code that needs to manipulate the task priority
just does it directly instead.
Noticed while reading through the lint logs.
Sponsored by: Tag1 Consulting, Inc.
The KERN_TLS only supports TCP, so use of the "tls" option with "udp" will
not work. This patch adds a test for this case, so that the mount is not
attempted when both "tls" and "udp" are specified.
These devices have non-pccard attachments. Warn for those as well. Both an and
wi don't do the modern cyrpto needed to use these cards on secure wifi networks.
an needs firmware from Cisco, which I don't think was ever produced. wi could
in theory do it with raw frames and on-host encryption, but nobody has written
that in the 15 years since WEP was cracked.
MFC After: 3 days
Noticed by: rgrimes
Differential Revision: https://reviews.freebsd.org/D26138
Add deprecation notice for apm bios, aka the apm(4) device. The apm(8)
command will remain, at least for a while, since ACPI emulates the apm
ioctl interface.
Discussed on: arch@
Relnotes: yes
MFC After: 3 days
PDDR (Port Diagnostics Database Register) is used to read the physical
layer debug database, which contains helpful troubleshooting information
regarding the state of the link.
PDDR register can only be queried when PCAM register reports it as
supported in its register mask. A new helper macro was added to
the MLX5_CAP_* infrastructure in order to access this mask.
Sponsored by: Mellanox Technologies - Nvidia
MFC after: 1 week
Coverity claims the call to rdma_gid2ip in cma_igmp_send overwrites addr.
Use a consistent definition of sockaddr to prevent detections and code
changes in the future.
Submitted by: bret_ketchum@dell.com
Reported by: Coverity
Reviewed by: hselasky, kib
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D26229
Noted in D24652, we currently set shmfd->shm_flags on every
shm_open()/shm_open2(). This wasn't properly thought out; one shouldn't be
able to specify incompatible flags on subsequent opens of non-anon shm.
Move setting of shm_flags explicitly to the two places shmfd are created, as
we do with seals, and validate when we're opening a pre-existing mapping
that we've either passed no flags or we've passed the exact same flags as
the first time.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D26242
When the zone_jumbop is exhausted, most things using
using sosend* (like sshd) will eventually
fail or hang if allocations are limited to the
depleted jumbop zone. This makes it imossible to
communicate with a box which is under an attach which
exhausts the jumbop zone.
Rather than depending on the page size zone, also try cluster
allocations to satisfy larger requests. This allows me
to ssh to, and serve 100Gb/s of traffic from a server which
under attack and has had its page-sized zone exhausted.
Reviewed by: glebius, markj, rmacklem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26150
In hv_storvsc_io_request() when coring, prevent changing of the send channel
from the base channel to another one. storvsc_poll always probes on the base
channel.
Based upon conversations with Microsoft, changed the handling of srb_status
codes. Most we should never get, others yes. All are treated as retry-able
except for two. We should not get these statuses, but if we ever do, the I/O
state is not known.
Submitted by: Alexander Sideropoulos <Alexander.Sideropoulos@netapp.com>
Reviewed by: trasz, allanjude, whu
MFC after: 1 week
Sponsored by: Netapp Inc
Differential Revision: https://reviews.freebsd.org/D25756
To paraphrase the below-referenced PR:
This logic originated in the KAME project, and was even controversial when
it was enabled there by default in 2001. No such equivalent logic exists in
the IPv4 stack, and it turns out that this leads to us dropping valid
traffic when the "point to point" interface is actually a 1:many tun
interface, e.g. with the wireguard userland stack.
Even in the case of true point-to-point links, this logic only avoids
transient looping of packets sent by misconfigured applications or
attackers, which can be subverted by proper route configuration rather than
hardcoded logic in the kernel to drop packets.
In the review, melifaro goes on to note that the kernel can't fix it, so it
perhaps shouldn't try to be 'smart' about it. Additionally, that TTL will
still kick in even with incorrect route configuration.
PR: 247718
Reviewed by: melifaro, rgrimes
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25567
the hook at queue mode was that mn_rx_intr() doesn't run at splnet
level. In today's netgraph the only legitimate reason for queue mode
is recursion avoidance. So I see no reason for queue mode here.
Not tested!
In Linux, ksize() gets the actual amount of memory allocated for a given
object. This commit adds malloc_usable_size() to FreeBSD KPI which does
the same. It also maps LinuxKPI ksize() to newly created function.
ksize() function is used by drm-kmod.
Reviewed by: hselasky, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D26215
Convert two different sysctl to using sbuf. First, for all the default
sysctls we implement for each device driver that's attached. This is a
pure sbuf conversion.
Second, convert sysctl_devices to fill its buffer with sbuf rather
than a hand-rolled crappy thing I wrote years ago.
Reviewed by: cem, markj
Differential Revision: https://reviews.freebsd.org/D26206
devctl_notify_f isn't needed, so retire it. The flags argument is now
unused, so rather than keep it around, retire it. Convert all old
users of it to devctl_notify(). This path no longer sleeps, so is safe
to call from any context. Since it doesn't sleep, it doesn't need to
know if it is OK to sleep or not.
Reviewed by: markj@
Differential Revision: https://reviews.freebsd.org/D26140
Convert the memory management of devctl. Rewrite if to make better
use of memory. This eliminates several mallocs (5? worse case) needed
to send a message. It's now possible to always send a message, though
if things are really backed up the oldest message will be dropped to
free up space for the newest.
Add a static bus_child_{location,pnpinfo}_sb to start migrating to
sbuf instead of buffer + length. Use it in the new code. Other code
will be converted later (bus_child_*_str is only used inside of
subr_bus.c, though implemented in ~100 places in the tree).
Reviewed by: markj@
Differential Revision: https://reviews.freebsd.org/D26140
No functional changes.
Initially this function was created to perform runtime flag conversions
for the previous incarnation of fib lookup functions. As these functions
got deprecated, move the function to the file with the only remaining
caller. Lastly, rename it to convert_rt_to_nh_flags() to follow the
naming notation.
No functional changes.
net/route/shared.h was created in the inital phases of nexthop conversion.
It was intended to serve the same purpose as route_var.h - share definitions
of functions and structures between the routing subsystem components. At
that time route_var.h was included by many files external to the routing
subsystem, which largerly defeats its purpose.
As currently this is not the case anymore and amount of route_var.h includes
is roughly the same as shared.h, retire the latter in favour of the former.
As nexthops are immutable, some operations such as route attribute changes
require nexthop fetching, forking, modification and route switching.
These operations are not atomic, so they may need to be retried multiple
times in presence of multiple speakers changing the same route.
This change introduces "synchronisation" primitive: route_update_conditional(),
simplifying logic for route changes and upcoming multipath operations.
Differential Revision: https://reviews.freebsd.org/D26216