Active Open:
- Save the socket's vnet at the time of the active open (t4_connect) and
switch to it when processing the reply (do_act_open_rpl or
do_act_establish).
Passive Open:
- Save the listening socket's vnet in the driver's listen_ctx and switch
to it when processing incoming SYNs for the socket.
- Reject SYNs that arrive on an ifnet that's not in the same vnet as the
listening socket.
CLIP (Compressed Local IPv6) table:
- Add only those IPv6 addresses to the CLIP that are in a vnet
associated with one of the card's ifnets.
Misc:
- Set vnet from the toepcb when processing TCP state transitions.
- The kernel sets the vnet when calling the driver's output routine
so t4_push_frames runs in proper vnet context already. One exception
is when incoming credits trigger tx within the driver's ithread. Set
the vnet explicitly in do_fw4_ack for that case.
MFC after: 3 days
Sponsored by: Chelsio Communications
AIO write requests for a TOE socket on a Chelsio T4+ adapter can now
DMA directly from the user-supplied buffer. This is implemented by
wiring the pages backing the user-supplied buffer and queueing special
mbufs backed by raw VM pages to the socket buffer. The TOE code
recognizes these special mbufs and builds a sglist from the VM page
array associated with the mbuf when queueing a work request to the TOE.
Because these mbufs do not have an associated virtual address, m_data
is not valid. Thus, the AIO handler does not invoke sosend() directly
for these mbufs but instead inlines portions of sosend_generic() and
tcp_usr_send().
An aiotx_buffer structure is used to describe the user buffer (e.g.
it holds the array of VM pages and a reference to the AIO job). The
special mbufs reference this structure via m_ext. Note that a single
job might be split across multiple mbufs (e.g. if it is larger than
the socket buffer size). The 'ext_arg2' member of each mbuf gives an
offset relative to the backing aiotx_buffer. The AIO job associated
with an aiotx_buffer structure is completed when the last reference to
the structure is released.
Zero-copy aio_write()'s for connections associated with a given
adapter can be enabled/disabled at runtime via the
'dev.t[45]nex.N.toe.tx_zcopy' sysctl.
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
related to "shared" CPLs.
a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function. Allow callers to direct the response to any iq. Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.
b) Remove all CPL handler tables from struct adapter. This reduces its
size by around 2KB. All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation. The registration
functions do not need an adapter parameter any more.
c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode. There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL. The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply. Update t4_write_l2e and callers to to support any
wrq/iq combination.
Approved by: re@ (kib@)
Sponsored by: Chelsio Communications
Chelsio's TCP offload engine supports direct DMA of received TCP payload
into wired user buffers. This feature is known as Direct-Data Placement.
However, to scale well the adapter needs to prepare buffers for DDP
before data arrives. aio_read() is more amenable to this requirement than
read() as applications often call read() only after data is available in
the socket buffer.
When DDP is enabled, TOE sockets use the recently added pru_aio_queue
protocol hook to claim aio_read(2) requests instead of letting them use
the default AIO socket logic. The DDP feature supports scheduling DMA
to two buffers at a time so that the second buffer is ready for use
after the first buffer is filled. The aio/DDP code optimizes the case
of an application ping-ponging between two buffers (similar to the
zero-copy bpf(4) code) by keeping the two most recently used AIO buffers
wired. If a buffer is reused, the aio/DDP code is able to reuse the
vm_page_t array as well as page pod mappings (a kind of MMU mapping the
Chelsio NIC uses to describe user buffers). The generation of the
vmspace of the calling process is used in conjunction with the user
buffer's address and length to determine if a user buffer matches a
previously used buffer. If an application queues a buffer for AIO that
does not match a previously used buffer then the least recently used
buffer is unwired before the new buffer is wired. This ensures that no
more than two user buffers per socket are ever wired.
Note that this feature is best suited to applications sending a steady
stream of data vs short bursts of traffic.
Discussed with: np
Relnotes: yes
Sponsored by: Chelsio Communications
Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port. This change allows additional non-netmap
interfaces to be configured on each port. Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.
Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).
T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).
One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.
The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.
The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats. There is currently no way to
clear the stats of an individual VI.
Reviewed by: np
MFC after: 1 month
Sponsored by: Chelsio
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.
Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149
Drivers (ULDs) and the base if_cxgbe driver.
Track the per-adapter activation of ULDs in a new "active_ulds" field.
This was done pretty arbitrarily before this change -- via TOM_INIT_DONE
in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in
adapter->offload_map for iWARP.
iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM
ULD). The rules are:
a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then
iWARP and/or iSCSI are enabled too.
b) When the iWARP and iSCSI modules are loaded they go looking for
adapters with TOE enabled and enable themselves on that adapter.
c) You cannot deactivate or unload the TOM module from underneath iWARP
or iSCSI. Any such attempt will fail with EBUSY.
MFC after: 2 weeks
cannot be sent to the chip because a prerequisite L2 resolution
failed.
Submitted by: Hariprasad at chelsio dot com (original version)
MFC after: 2 weeks.
- tom_uninit had to be reworked not to hold the adapter lock (a mutex)
around t4_deactivate_uld, which acquires the uld_list_lock.
- the ifc_match for the interface cloner that creates the tracer ifnet
had to be reworked as the kernel calls ifc_match with the global
if_cloners_mtx held.
includes support for the NIC and TOE features of the 40G, 10G, and
1G/100M cards based on the T5.
The ASIC is mostly backward compatible with the Terminator 4 so cxgbe(4)
has been updated instead of writing a brand new driver. T5 cards will
show up as cxl (short for cxlgb) ports attached to the t5nex bus driver.
Sponsored by: Chelsio
This is the Compressed Local IPv6 table on the chip. To save space, the
chip uses an index into this table instead of a full IPv6 address in
some of its hardware data structures.
For now the driver fills this table with all the local IPv6 addresses
that it sees at the time the table is initialized. I'll improve this
later so that the table is updated whenever new IPv6 addresses are
configured or existing ones deleted.
MFC after: 1 week
- Teach find_best_mtu_idx() to deal with IPv6 endpoints.
- Install correct protosw in offloaded TCP/IPv6 sockets when DDP is
enabled.
- Move set_tcp_ddp_ulp_mode to t4_tom.c so that t4_tom.h can be included
without having to drag in t4_msg.h too. This was bothering the iWARP
driver for some reason.
MFC after: 1 week
on the fast data path) and use them instead of frobbing the adapter lock
and busy flag directly.
Other changes made while reworking all slow operations:
- Wait for the reply to a filter request (add/delete). This guarantees
that the operation is complete by the time the ioctl returns.
- Tidy up the tid_info structure.
- Do not allow the tx queue size to be set to something that's not a
power of 2.
MFC after: 1 week
Basically, this is automatic rx zero copy when feasible. TCP payload is
DMA'd directly into the userspace buffer described by the uio submitted
in soreceive by an application.
- Works with sockets that are being handled by the TCP offload engine
of a T4 chip (you need t4_tom.ko module loaded after cxgbe, and an
"ifconfig +toe" on the cxgbe interface).
- Does not require any modification to the application.
- Not enabled by default. Use hw.t4nex.<X>.toe.ddp="1" to enable it.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.
- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.
Build-tested with make universe.
30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE
Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe
Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe
Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)