Commit graph

48 commits

Author SHA1 Message Date
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
John Baldwin
2ff447ee3b cxgbe: Enable TOE TLS RX when an RX key is provided via setsockopt().
Rather than requiring a socket to be created as a TLS socket from the
get go, switch a TOE socket from "plain" TOE to TLS mode when a
receive key is added to the socket.

The firmware is only able to switch a "plain" TOE connection to TLS
mode if the head of the pending socket data is the start of a TLS
record, so the connection is migrated to TLS mode as a multi-step
process.

When TOE TLS RX is enabled, the associated connection's receive side
is frozen via a flag in the TCB.  The state of the socket buffer is
then examined to determine if the pending data in the socket buffer
ends on a TLS record boundary.  If so, the connection is migrated to
TLS mode and unfrozen.  Otherwise, the connection is unfrozen
temporarily until more data arrives.  Once more data arrives, the
receive queue is frozen again and rechecked.  This continues until the
connection is paused at a record boundary.  Any records received
before TLS mode is enabled are decrypted as software records.

Note that this removes the 'rx_tls_ports' sysctl.  TOE TLS offload for
receive is now enabled automatically on existing TOE connections when
using a KTLS-aware SSL library just as it was previously enabled
automatically for TLS transmit.  This also enables TLS offload for TOE
connections which enable TLS after passing initial data in the clear
(e.g. STARTTLS with SMTP).

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37351
2022-11-15 12:08:51 -08:00
John Baldwin
5b27e4b27c cxgbei: Support for ISO (iSCSI segmentation offload).
ISO can be disabled before establishing a connection by setting
dev.tNnex.N.toe.iso to 0.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31223
2021-08-06 14:21:37 -07:00
Navdeep Parhar
557c4521bb cxgbe/t4_tom: Implement tod_pmtu_update.
tod_pmtu_update was added to the kernel in 01d74fe1ff.

Sponsored by:	Chelsio Communications
2021-04-22 14:48:57 -07:00
John Baldwin
077ba6a845 cxgbe: Add a struct sge_ofld_txq type.
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues.  Currently it only holds a work queue but will
include additional state in future changes.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29382
2021-03-26 15:19:58 -07:00
John Baldwin
0082e479ef Clear TLS offload mode if a TLS socket hangs without receiving data.
By default, if a TOE TLS socket stops receiving data for more than 5
seconds, revert the connection back to plain TOE mode.  This provides
a fallback if the userland SSL library does not support KTLS.  In
addition, for client TLS 1.3 sockets using connect(), the TOE socket
blocks before the handshake has completed since the socket option is
only invoked for the final handshake.

The timeout defaults to 5 seconds, but can be changed at boot via the
hw.cxgbe.toe.tls_rx_timeout tunable or for an individual interface via
the dev.<nexus>.toe.tls_rx_timeout sysctl.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27470
2020-12-03 22:06:08 +00:00
John Baldwin
56fb710f1b Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway.  This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).

In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.

Reviewed by:	gallatin, hselasky, np, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26689
2020-10-06 17:58:56 +00:00
Navdeep Parhar
b0dede77b1 cxgbe/iw_cxgbe: Add an async callback to notify iw_cxgbe in case of a
fatal error.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2020-05-19 16:28:20 +00:00
John Baldwin
bddf73433e NIC KTLS for Chelsio T6 adapters.
This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters.
Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE
connections.

NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment
the encrypted TLS frames output by the crypto engine.  Instead, the
TOE is placed into a special setup to permit "dummy" connections to be
associated with regular sockets using KTLS.  This permits using the
TOE to segment the encrypted TLS records.  However, this approach does
have some limitations:

1) Regular TOE sockets cannot be used when the TOE is in this special
   mode.  One can use either TOE and TOE-based KTLS or NIC KTLS, but
   not both at the same time.

2) In NIC KTLS mode, the TOE is only able to accept a per-connection
   timestamp offset that varies in the upper 4 bits.  Put another way,
   only connections whose timestamp offset has the 28 lower bits
   cleared can use NIC KTLS and generate correct timestamps.  The
   driver will refuse to enable NIC KTLS on connections with a
   timestamp offset with any of the lower 28 bits set.  To use NIC
   KTLS, users can either disable TCP timestamps by setting the
   net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the
   tcp_new_ts_offset() function to clear the lower 28 bits of the
   generated offset.

3) Because the TCP segmentation relies on fields mirrored in a TCB in
   the TOE, not all fields in a TCP packet can be sent in the TCP
   segments generated from a TLS record.  Specifically, for packets
   containing TCP options other than timestamps, the driver will
   inject an "empty" TCP packet holding the requested options (e.g. a
   SACK scoreboard) along with the segments from the TLS record.
   These empty TCP packets are counted by the
   dev.cc.N.txq.M.kern_tls_options sysctls.

Unlike TOE TLS which is able to buffer encrypted TLS records in
on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS
records for retransmit requests as well as non-retransmit requests
that do not include the start of a TLS record but do include the
trailer.  The T6 NIC KTLS code tries to optimize some of the cases for
requests to transmit partial TLS records.  In particular it attempts
to minimize sending "waste" bytes that have to be given as input to
the crypto engine but are not needed on the wire to satisfy mbufs sent
from the TCP stack down to the driver.

TCP packets for TLS requests are broken down into the following
classes (with associated counters):

- Mbufs that send an entire TLS record in full do not have any waste
  bytes (dev.cc.N.txq.M.kern_tls_full).

- Mbufs that send a short TLS record that ends before the end of the
  trailer (dev.cc.N.txq.M.kern_tls_short).  For sockets using AES-CBC,
  the encryption must always start at the beginning, so if the mbuf
  starts at an offset into the TLS record, the offset bytes will be
  "waste" bytes.  For sockets using AES-GCM, the encryption can start
  at the 16 byte block before the starting offset capping the waste at
  15 bytes.

- Mbufs that send a partial TLS record that has a non-zero starting
  offset but ends at the end of the trailer
  (dev.cc.N.txq.M.kern_tls_partial).  In order to compute the
  authentication hash stored in the trailer, the entire TLS record
  must be sent as input to the crypto engine, so the bytes before the
  offset are always "waste" bytes.

In addition, other per-txq sysctls are provided:

- dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq
  using AES-CBC.

- dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq
  using AES-GCM.

- dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to
  compensate for the TOE engine not being able to set FIN on the last
  segment of a TLS record if the TLS record mbuf had FIN set.

- dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this
  txq including full, short, and partial records.

- dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header
  and payload) sent for TLS record requests.

- dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS
  record requests.

To enable NIC KTLS with T6, set the following tunables prior to
loading the cxgbe(4) driver:

hw.cxgbe.config_file=kern_tls
hw.cxgbe.kern_tls=1

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21962
2019-11-21 19:30:31 +00:00
John Baldwin
e38a50e8b6 Split Chelsio send tags into a generic base tag and a ratelimit tag.
NIC KTLS will add a new TLS send tag type in cxgbe(4) that is a
distinct tag from a ratelimit tag.  To support this, refactor
cxgbe_snd_tag to be a simple send tag with a type and convert the
existing ratelimit tag to a new cxgbe_rate_tag structure.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D22072
2019-10-22 20:41:54 +00:00
Navdeep Parhar
be09e82abb cxgbe/t4_tom: Catch up with r344433, which removed tcb_autorcvbuf_inc.
The declaration in tcp_var.h is still around so t4_tom continued to
compile but wouldn't load.  A separate commit will fix tcp_var.h

Reported By: Dustin Marquess (dmarquess at gmail)

Sponsored by:	Chelsio Communications
2019-03-29 16:43:24 +00:00
Navdeep Parhar
51347c3ff1 cxgbe(4): Use two hashes instead of a table to keep track of
hashfilters.  Two because the driver needs to look up a hashfilter by
its 4-tuple or tid.

A couple of fixes while here:
- Reject attempts to add duplicate hashfilters.
- Do not assume that any part of the 4-tuple that isn't specified is 0.
  This makes it consistent with all other mandatory parameters that
  already require explicit user input.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2018-08-15 03:03:01 +00:00
Navdeep Parhar
5fc0f72f3b cxgbe(4): Add support for high priority filters on T6+. They have their
own region in the TCAM starting with T6, unlike previous chips where
they were in the same region as normal filters.

These filters "hit" before anything else in the LE's lookup.  The exact
order is:
a) High priority filters
b) TOE's active region (TCAM and/or hash)
c) Servers (TOE hw listeners)
d) Normal filters

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-09 14:19:47 +00:00
Navdeep Parhar
0c71c9ccb2 cxgbe(4): Improvements in TID management.
- Ignore any type of TID where the start/end values are not in the
  correct order.  There are situations where the firmware isn't able to
  reserve room for the number requested in the config file but doesn't
  report a failure during configuration and instead sets end <= start.

- Track start/end in tid_tab and remove some redundant copies from
  adapter->params.

- Move all the start/end and other read-only parameters to a quiet part
  of tid_tab, away from the tid locks.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-02 22:52:05 +00:00
Navdeep Parhar
786099de5e cxgbe(4): Data path for rate-limited tx.
This is hardware support for the SO_MAX_PACING_RATE sockopt (see
setsockopt(2)), which is available in kernels built with "options
RATELIMIT".

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2018-05-24 10:18:14 +00:00
Navdeep Parhar
67e071128d cxgbe(4): Implement ifnet callbacks that deal with send tags.
An etid (ethoffload tid) is allocated for a send tag and it acquires a
reference on the traffic class that matches the send parameters
associated with the tag.

Sponsored by:	Chelsio Communications
2018-05-18 06:09:15 +00:00
Navdeep Parhar
89f651e704 cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool.  Hash and
normal TCAM filters can be used together.  The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters.  Any T5/T6 card with memory can support at
least half a million hash filters.  The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file.  The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto).  It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters.  A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by:	Chelsio Communications
2018-05-09 04:09:49 +00:00
Navdeep Parhar
111638bf68 cxgbe(4): Convert ACT_OPEN_RPL to a shared CPL.
Reserve 3b in the 14b atid to identify the owner and use it to dispatch
the CPL.  This allows all CPLs that use an atid to be used as shared
CPLs, although ACT_OPEN_RPL is the only one being converted in this
revision.

Sponsored by:	Chelsio Communications
2018-04-30 21:47:30 +00:00
Navdeep Parhar
1131c927c4 cxgbe(4): Add support for Connection Offload Policy (aka COP).
COP allows fine-grained control on whether to offload a TCP connection
using t4_tom, and what settings to apply to a connection selected for
offload.  t4_tom must still be loaded and IFCAP_TOE must still be
enabled for full TCP offload to take place on an interface.  The
difference is that IFCAP_TOE used to be the only knob and would enable
TOE for all new connections on the inteface, but now the driver will
also consult the COP, if any, before offloading to the hardware TOE.

A policy is a plain text file with any number of rules, one per line.
Each rule has a "match" part consisting of a socket-type (L = listen,
A = active open, P = passive open, D = don't care) and a pcap-filter(7)
expression, and a "settings" part that specifies whether to offload the
connection or not and the parameters to use if so.  The general format
of a rule is: [socket-type] expr => settings

Example.  See cxgbetool(8) for more information.
[L] ip && port http => offload
[L] port 443 => !offload
[L] port ssh => offload
[P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno
[P] dst port ssh => offload !nagle ecn cong tahoe
[P] dst port http => offload
[A] dst port 443 => offload tls
[A] dst net 192.168/16 => offload !timestamp cong highspeed

The driver processes the rules for each new listen, active open, or
passive open and stops at the first match.  There is an implicit rule at
the end of every policy that prohibits offload when no rule in the
policy matches:
[D] all => !offload

This is a reworked and expanded version of a patch submitted by
Krishnamraju Eraparaju @ Chelsio.

Sponsored by:	Chelsio Communications
2018-04-14 19:07:56 +00:00
John Baldwin
1e9538d253 Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections.  Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing.  Currently
these socket options are implemented as TCP options in the vendor
specific range.  A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.

TOE sockets can either offload both transmit and reception of TLS
records or just transmit.  TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface.  Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key.  This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library).  Receive offload can only be used with TOE sockets using the
TLS mode.  The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers.  Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket.  Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value.  A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library.  Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.

New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:

dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620

TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.

TLS transmit work requests require a buffer containing IVs.  If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request.  This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.

Received TLS frames use two new CPL messages.  The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb.  The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors.  The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.

A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().

TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.

In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host.  As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket.  In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase.  Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.

A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS).  The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.

Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k).  To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.

Reviewed by:	np (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14529
2018-03-13 23:05:51 +00:00
John Baldwin
198729ea7d Fetch TLS key parameters from the firmware.
The parameters describe how much of the adapter's memory is reserved for
storing TLS keys.  The 'meminfo' sysctl now lists this region of adapter
memory as 'TLS keys' if present.

Sponsored by:	Chelsio Communications
2018-02-26 21:56:06 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Navdeep Parhar
5c2bacde58 Update the iw_cxgbe bits in the projects branch.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2017-11-07 23:52:14 +00:00
Navdeep Parhar
3ef7429927 cxgbe/t4_tom: Add a knob to select the congestion control algorigthm
used by the TOE hardware for fully offloaded connections.  The knob
affects new connections only.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-08-31 20:33:22 +00:00
Navdeep Parhar
0a600b6312 cxgbe: Query some more RDMA related parameters from the firmware.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-13 17:16:36 +00:00
Navdeep Parhar
7cba15b16e cxgbe/cxgbei: Retire all DDP related code from cxgbei and switch to
routines available in t4_tom to manage the iSCSI DDP page pod region.

This adds the ability to use multiple DDP page sizes to the iSCSI
driver, among other improvements.

Sponsored by:	Chelsio Communications
2016-09-01 20:43:01 +00:00
John Baldwin
07159830be Add support for zero-copy aio_write() on TOE sockets.
AIO write requests for a TOE socket on a Chelsio T4+ adapter can now
DMA directly from the user-supplied buffer.  This is implemented by
wiring the pages backing the user-supplied buffer and queueing special
mbufs backed by raw VM pages to the socket buffer.  The TOE code
recognizes these special mbufs and builds a sglist from the VM page
array associated with the mbuf when queueing a work request to the TOE.

Because these mbufs do not have an associated virtual address, m_data
is not valid.  Thus, the AIO handler does not invoke sosend() directly
for these mbufs but instead inlines portions of sosend_generic() and
tcp_usr_send().

An aiotx_buffer structure is used to describe the user buffer (e.g.
it holds the array of VM pages and a reference to the AIO job).  The
special mbufs reference this structure via m_ext.  Note that a single
job might be split across multiple mbufs (e.g. if it is larger than
the socket buffer size).  The 'ext_arg2' member of each mbuf gives an
offset relative to the backing aiotx_buffer.  The AIO job associated
with an aiotx_buffer structure is completed when the last reference to
the structure is released.

Zero-copy aio_write()'s for connections associated with a given
adapter can be enabled/disabled at runtime via the
'dev.t[45]nex.N.toe.tx_zcopy' sysctl.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
2016-07-27 18:29:35 +00:00
John Baldwin
dc9643853d Use DDP to implement zerocopy TCP receive with aio_read().
Chelsio's TCP offload engine supports direct DMA of received TCP payload
into wired user buffers.  This feature is known as Direct-Data Placement.
However, to scale well the adapter needs to prepare buffers for DDP
before data arrives.  aio_read() is more amenable to this requirement than
read() as applications often call read() only after data is available in
the socket buffer.

When DDP is enabled, TOE sockets use the recently added pru_aio_queue
protocol hook to claim aio_read(2) requests instead of letting them use
the default AIO socket logic.  The DDP feature supports scheduling DMA
to two buffers at a time so that the second buffer is ready for use
after the first buffer is filled.  The aio/DDP code optimizes the case
of an application ping-ponging between two buffers (similar to the
zero-copy bpf(4) code) by keeping the two most recently used AIO buffers
wired.  If a buffer is reused, the aio/DDP code is able to reuse the
vm_page_t array as well as page pod mappings (a kind of MMU mapping the
Chelsio NIC uses to describe user buffers).  The generation of the
vmspace of the calling process is used in conjunction with the user
buffer's address and length to determine if a user buffer matches a
previously used buffer.  If an application queues a buffer for AIO that
does not match a previously used buffer then the least recently used
buffer is unwired before the new buffer is wired.  This ensures that no
more than two user buffers per socket are ever wired.

Note that this feature is best suited to applications sending a steady
stream of data vs short bursts of traffic.

Discussed with:	np
Relnotes:	yes
Sponsored by:	Chelsio Communications
2016-05-07 00:33:35 +00:00
Pedro F. Giffuni
453130d9bf sys/dev: minor spelling fixes.
Most affect comments, very few have user-visible effects.
2016-05-03 03:41:25 +00:00
John Baldwin
80f3b01958 Remove #ifdef's from various structures used in the cxgbe/cxl driver.
This provides a constant ABI and layout for these structures (especially
struct adapter) avoiding some foot shooting.

Discussed with:	np
Sponsored by:	Chelsio Communications
2016-03-31 18:36:50 +00:00
Navdeep Parhar
9eb533d3b4 cxgbe(4): Updates to the base NIC driver and t4_tom to support the iSCSI
offload driver.  These changes come from projects/cxl_iscsi.
2015-12-26 00:26:02 +00:00
Navdeep Parhar
b3d44a6800 cxgbe(4): tidy up some of the interaction between the Upper Layer
Drivers (ULDs) and the base if_cxgbe driver.

Track the per-adapter activation of ULDs in a new "active_ulds" field.
This was done pretty arbitrarily before this change -- via TOM_INIT_DONE
in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in
adapter->offload_map for iWARP.

iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM
ULD).  The rules are:
a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then
   iWARP and/or iSCSI are enabled too.
b) When the iWARP and iSCSI modules are loaded they go looking for
   adapters with TOE enabled and enable themselves on that adapter.
c) You cannot deactivate or unload the TOM module from underneath iWARP
   or iSCSI.  Any such attempt will fail with EBUSY.

MFC after:	2 weeks
2015-02-08 09:28:55 +00:00
Navdeep Parhar
a2355dd909 cxgbe(4): reserve id for iSCSI upper layer driver. 2015-02-05 08:52:20 +00:00
Navdeep Parhar
79b93bf6a3 cxgbe/tom: do not engage the TOE's payload chopper for payload < 2 MSS
or for 10Gbps ports.

MFC after:	2 weeks
2015-01-03 00:09:21 +00:00
Navdeep Parhar
0fe982772d Some hooks in cxgbe(4) for the offloaded iSCSI driver.
(I'm committing this on behalf of my colleagues in the Storage team
at Chelsio).

Submitted by:	Sreenivasa Honnur <shonnur at chelsio dot com>
Sponsored by:	Chelsio Communications.
2014-07-24 18:39:08 +00:00
Navdeep Parhar
93e9cae3fa Read card capabilities after firmware initialization, instead of setting
them up as part of firmware initialization (which the driver gets to do
only if it's the master driver).

Read the range of tids available for the ETHOFLD functionality if it's
enabled.

New is_ftid() and is_etid() functions to test whether a tid falls within
the range of filter tids or ETHOFLD tids respectively.

MFC after:	2 weeks
2013-12-14 03:08:03 +00:00
Navdeep Parhar
9800517691 Add hooks in base cxgbe(4) for the iWARP upper-layer driver. Update a
couple of assertions in the TOE driver as well.
2013-08-28 20:45:45 +00:00
Navdeep Parhar
6eb3180fb2 - Make note of interface MTU change if the rx queues exist, and not just
when the interface is up.
- Add a tunable to control the TOE's rx coalesce feature (enabled by
  default as it always has been).  Consider the interface MTU or the
  coalesce size when deciding which cluster zone to use to fill the
  offload rx queue's free list.  The tunable is:
  dev.{t4nex,t5nex}.<N>.toe.rx_coalesce

MFC after:	1 day
2013-07-04 21:19:01 +00:00
Navdeep Parhar
e0f8a7f4da cxgbe/tom: Fix bad signed/unsigned mixup in the stid allocator. This
fixes a panic when allocating a mixture of IPv6 and IPv4 stids.

MFC after:	1 week
2013-06-08 07:23:26 +00:00
Navdeep Parhar
0a0a697c73 cxgbe(4): Updates to the hardware L2 table management code.
- Add full support for IPv6 addresses.

- Read the size of the L2 table during attach.  Do not assume that PCIe
  physical function 4 of the card has all of the table to itself.

- Use FNV instead of Jenkins to hash L3 addresses and drop the private
  copy of jhash.h from the driver.

MFC after:	1 week
2013-01-14 20:36:22 +00:00
Navdeep Parhar
c66c36a454 Overhaul the stid allocator so that it can be used for IPv6 servers
too.  The entry for an IPv6 server in the TCAM takes up the equivalent
of two ordinary stids and must be properly aligned too.

MFC after:	1 week
2013-01-11 00:07:01 +00:00
Navdeep Parhar
b174b65819 cxgbe(4): Add functions to help synchronize "slow" operations (those not
on the fast data path) and use them instead of frobbing the adapter lock
and busy flag directly.

Other changes made while reworking all slow operations:
- Wait for the reply to a filter request (add/delete).  This guarantees
  that the operation is complete by the time the ioctl returns.
- Tidy up the tid_info structure.
- Do not allow the tx queue size to be set to something that's not a
  power of 2.

MFC after:	1 week
2013-01-10 23:56:50 +00:00
Navdeep Parhar
e682d02e12 Support for TCP DDP (Direct Data Placement) in the T4 TOE module.
Basically, this is automatic rx zero copy when feasible.  TCP payload is
DMA'd directly into the userspace buffer described by the uio submitted
in soreceive by an application.

- Works with sockets that are being handled by the TCP offload engine
  of a T4 chip (you need t4_tom.ko module loaded after cxgbe, and an
  "ifconfig +toe" on the cxgbe interface).
- Does not require any modification to the application.
- Not enabled by default.  Use hw.t4nex.<X>.toe.ddp="1" to enable it.
2012-08-17 00:49:29 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
Navdeep Parhar
733b92779e Many updates to cxgbe(4)
- Device configuration via plain text config file.  Also able to operate
  when not attached to the chip as the master driver.

- Generic "work request" queue that serves as the base for both ctrl and
  ofld tx queues.

- Generic interrupt handler routine that can process any event on any
  kind of ingress queue (via a dispatch table).

- A couple of new driver ioctls.  cxgbetool can now install a firmware
  to the card ("loadfw" command) and can read the card's memory
  ("memdump" and "tcb" commands).

- Lots of assorted information within dev.t4nex.X.misc.*  This is
  primarily for debugging and won't show up in sysctl -a.

- Code to manage the L2 tables on the chip.

- Updates to cxgbe(4) man page to go with the tunables that have changed.

- Updates to the shared code in common/

- Updates to the driver-firmware interface (now at fw 1.4.16.0)

MFC after:	1 month
2011-12-16 02:09:51 +00:00
Navdeep Parhar
4dba21f17e L2 table code. This is enough to get the T4's switch + L2 rewrite
filters working.  (All other filters - switch without L2 info rewrite,
steer, and drop - were already fully-functional).

Some contrived examples of "switch" filters with L2 rewriting:

# cxgbetool t4nex0  iport 0  dport 80  action switch  vlan +9  eport 3
Intercept all packets received on physical port 0 with TCP port 80 as
destination, insert a vlan tag with VID 9, and send them out of port 3.

# cxgbetool t4nex0  sip 192.168.1.1/32  ivlan 5  action switch \
	vlan =9  smac aa:bb:cc:dd:ee:ff  eport 0
Intercept all packets (received on any port) with source IP address
192.168.1.1 and VLAN id 5, rewrite the VLAN id to 9, rewrite source mac
to aa:bb:cc:dd:ee:ff, and send it out of port 0.

MFC after:	1 week
2011-05-30 21:07:26 +00:00
Navdeep Parhar
8820ce5fe7 T4 packet filtering/steering.
- Enable 5-tuple and every-packet lookup.

- Setup the default filter mode to allow filtering/steering based on IP
  protocol, ingress port, inner VLAN ID, IP frag, FCoE, and MPS match
  type; all combined together.  You can also filter based on MAC index,
  Ethernet type, IP TOS/IPv6 Traffic Class, and outer VLAN ID but you'll
  have to modify the default filter mode and exclude some of the
  match-fields in it.

  IPv4 and IPv6 SIP/DIP/SPORT/DPORT are always available in all filter
  rules.

- Add driver ioctls to get/set the global filter mode.

- Add driver ioctls to program and delete hardware filters.  A couple of
  the "switch" actions that rewrite Ethernet and VLAN information and
  switch the packet out of another port may not work as the L2 code is not
  yet in place.  Everything else, including all "drop" and "pass" rules
  with RSS or absolute qid, should work.

Obtained from:	 Chelsio Communications
2011-05-05 02:04:56 +00:00
Navdeep Parhar
54e4ee7163 cxgbe(4) - NIC driver for Chelsio T4 (Terminator 4) based 10Gb/1Gb adapters.
MFC after:	3 weeks
2011-02-18 08:00:26 +00:00