- Add optimizing LRO wrapper which pre-sorts all incoming packets
according to the hash type and flowid. This prevents exhaustion of
the LRO entries due to too many connections at the same time.
Testing using a larger number of higher bandwidth TCP connections
showed that the incoming ACK packet aggregation rate increased from
~1.3:1 to almost 3:1. Another test showed that for a number of TCP
connections greater than 16 per hardware receive ring, where 8 TCP
connections was the LRO active entry limit, there was a significant
improvement in throughput due to being able to fully aggregate more
than 8 TCP stream. For very few very high bandwidth TCP streams, the
optimizing LRO wrapper will add CPU usage instead of reducing CPU
usage. This is expected. Network drivers which want to use the
optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
"tcp_lro_flush()". Further the LRO control structure must be
initialized using "tcp_lro_init_args()" passing a non-zero number
into the "lro_mbufs" argument.
- Make LRO statistics 64-bit. Previously 32-bit integers were used for
statistics which can be prone to wrap-around. Fix this while at it
and update all SYSCTL's which expose LRO statistics.
- Ensure all data is freed when destroying a LRO control structures,
especially leftover LRO entries.
- Reduce number of memory allocations needed when setting up a LRO
control structure by precomputing the total amount of memory needed.
- Add own memory allocation counter for LRO.
- Bump the FreeBSD version to force recompilation of all KLDs due to
change of the LRO control structure size.
Sponsored by: Mellanox Technologies
Reviewed by: gallatin, sbruno, rrs, gnn, transport
Tested by: Netflix
Differential Revision: https://reviews.freebsd.org/D4914
If the NVSP protocol version is not greater than NVSP_PROTOCOL_VERSION_2,
then the recv buffer size is 15MB, otherwise the buffer size is 16MB.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: royger, Dexuan Cui <decui microsoft com>, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4814
This one mainly avoids mbuf cluster allocation for TCP ACKs during
TCP sending tests. And it gives me ~200Mbps improvement (4.7Gbps
-> 4.9Gbps), when running iperf3 TCP sending test w/ 16 connections.
While I'm here, nuke the unnecessary zeroing out pkthdr.csum_flags.
Reviewed by: adrain
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4853
Many applications and kernel modules (e.g. bridge) rely on the ifmedia
status report; give them what they want.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: Jun Su <junsu microsoftc com>, me, adrian
Modified by: me (minor)
Original differential: https://reviews.freebsd.org/D4611
Differential Revision: https://reviews.freebsd.org/D4852
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
- Implement the LRO using tcp_lro APIs, and LRO is enabled by default.
- Add several stats sysctl nodes.
- Check IP/TCP length before sending the packet to tcp_lro_rx(), if host
does not provide RX csum information (*); and add an option through
sysctl to always trust host TCP segment csum checks (default is off).
- Add sysctl to control the LRO entry depth; it is disabled by default.
It is used to avoid holding too much TCP segments in driver. Limiting
the LRO entry depth helps a lot in a one/two streams RX test.
This one 3x the RX performance on my local test (3Gbps -> 10Gbps), and
~2x the RX performance over a directly connected 40Ge network (5Gbps ->
9Gbps).
(*) It seems the host stops supplying csum information, once the network
load is high. This still needs investigation...
Reviewed by: Hongjiang Zhang <honzhan microsoft com>,
Dexuan Cui <decui microsoft com>,
Jun Su <junsu microsoft com>,
delphij
Tested by: me (local),
Hongjiang Zhang <honzhan microsoft com>
(directly connected 40Ge)
Approved by: delphij (mentor), adrian (mentor, no objection)
With feedback from: delphij, Hongjiang Zhang <honzhan microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4824
Windows Server 2012 and earlier hosts.
Submitted by: whu
Reviewed by: royger
Approved by: royger
MFC after: 3 days
Relnotes: No
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D3086
offload support was introduced in r284746.
While here also fix the ioctl() handler for IPv4 added in r279819,
which was never compiled in given opt_inet.h was not included.
- Bump link state when stopping or starting the interface;
- Don't handle SIOCGIFADDR specially, similar to r277103.
This change is based on a previous revision from Andy Zhang
(Microsoft) who did the diagnostic work and many thanks to
them for their help in supporting the HyperV work.
PR: kern/187203
MFC after: 2 weeks
it, except Ethernet, where it carried ng_ether(4) pointer.
For now carry the pointer in if_l2com directly.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
and holding a reference prior to calling further into the hyperv
stack.
Added missing FreeBSD idents.
Submitted by: Microsoft hyperv dev team
Approved by: re@ (gjb)
- change the SI_SUB_RUN_SCHEDULER sysinits in hv_utilc and
hv_netvsc_drv_freebsd.c to SI_SUB_KTHREAD_IDLE, since the
former is no longer in FreeBSD.
The use of these SYSINITs can probably be removed.
be pulled into FreeBSD. From now, FreeBSD will be considered the
upstream repo.
First step: move the drivers away from the contrib area and into
the base system.
A follow-on commit will include the drivers in the amd64 GENERIC kernel.