Commit graph

538 commits

Author SHA1 Message Date
Mark Johnston
01f43479b5 ipsec: Drain async ipsec_offload work when destroying a vnet
Re-apply commit e196b12f4d.  This was reverted by commit 28294dc924
because it could trigger a deadlock, but the underlying problem there
was fixed in commit f76826b892.

Reported by:	KASAN
Reviewed by:	kib
Fixes:		ef2a572bf6 ("ipsec_offload: kernel infrastructure")
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D46483
2024-09-04 14:28:28 +00:00
Konstantin Belousov
f76826b892 ipsec offload: use private taskqueue thread
Using global taskqueue_thread XXX with the vnet tasks scheduled during
VNET destruction.  VNET shutdown needs to wait for all vnet-scoped
SAs/SPs to be handled, and doing that from taskqueue_thread task
deadlocks because the same thread proceeds the removals.

Reviewed by:	markj
Sponsored by:	NVidia networking
Differential revision:	https://reviews.freebsd.org/D46494
2024-09-04 11:49:38 +03:00
Konstantin Belousov
1af77be327 ipsec_offlad: remove not needed IFP_HS_INPUT/OUTPUT flags
Calculate the hdr_ext_size unconditionally, it is kept unused for SAs
not handling the input.

Sponsored by:	NVidia networking
2024-09-04 11:49:38 +03:00
Konstantin Belousov
d02e1a3ffa ipsec_accel_output(): do not process packet if interface rejected offload
Sponsored by:	NVidia networking
2024-09-04 11:49:38 +03:00
Mark Johnston
28294dc924 Revert "ipsec: Drain async ipsec_offload work when destroying a vnet"
This change can cause a deadlock in some cases, since it's possible for
VNET teardown to happen in the context of taskqueue_thread, and
ipsec_accel_sync() drains taskqueue_thread's work queue.

This reverts commit e196b12f4d.
2024-08-30 15:00:16 +00:00
Mark Johnston
e196b12f4d ipsec: Drain async ipsec_offload work when destroying a vnet
The ipsec_offload code in some cases releases object references in an
asynchronous context where it needs to set the current VNET.  Make sure
that all such work completes before the VNET is actually destroyed,
otherwise a use-after-free is possible.

Reported by:	KASAN
Reviewed by:	kib
Fixes:		ef2a572bf6 ("ipsec_offload: kernel infrastructure")
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D46483
2024-08-30 13:08:20 +00:00
Kristof Provost
b1c3a4d75f netipsec: add probe points for the ipsec/esp/ah/ipcomp counters
Extend what we did for netinet counters in 60d8dbbef0 (netinet: add a probe
point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters, 2024-01-18) to the
IPsec code.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D46416
2024-08-28 12:02:45 +02:00
Konstantin Belousov
66f0e2017f ipsec_offload: add ipsec_accel_drv_sa_lifetime_fetch()
A function to fetch hardware counters for offloaded SA on specific
interface.

Sponsored by:	NVidia networking
2024-08-20 15:42:13 +03:00
Konstantin Belousov
c4a0ee9b97 ipsec_offload: add handler for interface down events
Remove all offloaded SAs and SPs on ifdown.

Sponsored by:	NVIDIA networking
2024-08-20 15:42:12 +03:00
Konstantin Belousov
65f264dcf7 ipsec_offload: indirect two more functions on the ipsec.ko module load
Specifically, ipsec_accel_on_ifdown() and ipsec_accel_drv_sa_lifetime_update()
should be present in kernel for future mlx5en driver to be statically
linkable into the kernel built with IPSEC_HOOKS + IPSEC_OFFLOAD.

Sponsored by:	NVIDIA networking
2024-07-30 17:59:49 +03:00
Konstantin Belousov
9a7096ff54 ipsec_offload: hide SA/SP offload lifecycle prints under verbose sysctl
Reported and reviewed by:	kp
Discussed with:	np
Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D46045
2024-07-21 11:53:19 +03:00
Konstantin Belousov
6023bd1d52 netipsec: move declaration of the sysctl net.inet{,6}.ipsec nodes to header
Reviewed by:	kp
Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D46045
2024-07-21 11:53:19 +03:00
Konstantin Belousov
eb0fdc7753 netinet/ipsec.h: remove unneeded "extern"s
Reviewed by:	kp
Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D46045
2024-07-21 11:53:19 +03:00
Konstantin Belousov
e6e2c0a5ef ipsec_offload: switch TF2_IPSEC_TSO on/off as appropriate on output
after the interface ipsec_accel method if_hwassist() is consulted.

Sponsored by:	NVIDIA networking
2024-07-12 07:27:59 +03:00
Konstantin Belousov
240b7bfe56 ipsec_offload: offload inner checksums calculations for UDP/TCP/TSO
and allow the interface driver to declare such support.

Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44221
2024-07-12 07:27:58 +03:00
Konstantin Belousov
ef2a572bf6 ipsec_offload: kernel infrastructure
Inline IPSEC offload moves almost whole IPSEC processing from the
CPU/MCU and possibly crypto accelerator, to the network card.

The transmitted packet content is not touched by CPU during TX
operations, kernel only does the required policy and security
association lookups to find out that given flow is offloaded, and then
packet is transmitted as plain text to the card. For driver convenience,
a metadata is attached to the packet identifying SA which must process
the packet. Card does encryption of the payload, padding, calculates
authentication, and does the reformat according to the policy.

Similarly, on receive, card does the decapsulation, decryption, and
authentification.  Kernel receives the identifier of SA that was
used to process the packet, together with the plain-text packet.

Overall, payload octets are only read or written by card DMA engine,
removing a lot of memory subsystem overhead, and saving CPU time because
IPSEC algos calculations are avoided.

If driver declares support for inline IPSEC offload (with the
IFCAP2_IPSEC_OFFLOAD capability set and registering method table struct
if_ipsec_accel_methods), kernel offers the SPD and SAD to driver.
Driver decides which policies and SAs can be offloaded based on
hardware capacity, and acks/nacks each SA for given interface to
kernel.  Kernel needs to keep this information to make a decision to
skip software processing on TX, and to assume processing already done
on RX.  This shadow SPD/SAD database of offloads is rooted from
policies (struct secpolicy accel_ifps, struct ifp_handle_sp) and SAs
(struct secasvar accel_ipfs, struct ifp_handle_sav).

Some extensions to the PF_KEY socket allow to limit interfaces for
which given SP/SA could be offloaded (proposed for offload).  Also,
additional statistics extensions allow to observe allocation/octet/use
counters for specific SA.

Since SPs and SAs are typically instantiated in non-sleepable context,
while offloading them into card is expected to require costly async
manipulations of the card state, calls to the driver for offload and
termination are executed in the threaded taskqueue.  It also solves
the issue of allocating resources needed for the offload database.
Neither ipf_handle_sp nor ipf_handle_sav do not add reference to the
owning SP/SA, the offload must be terminated before last reference is
dropped.  ipsec_accel only adds transient references to ensure safe
pointer ownership by taskqueue.

Maintaining the SA counters for hardware-accelerated packets is the
duty of the driver.  The helper ipsec_accel_drv_sa_lifetime_update()
is provided to hide accel infrastructure from drivers which would use
expected callout to query hardware periodically for updates.

Reviewed by:	rscheff	(transport, stack integration), np
Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44219
2024-07-12 07:27:58 +03:00
Konstantin Belousov
00524fd475 ipsec_output(): add mtu argument
Similarly, mtu is needed to decide inline IPSEC offloiad for the driver.

Sponsored by: NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44224
2024-07-12 06:29:31 +03:00
Konstantin Belousov
de1da299da ipsec_output(): add outcoming ifp argument
The information about the interface is needed to coordinate inline
offloading of IPSEC processing with corresponding driver.

Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44223
2024-07-12 06:29:31 +03:00
Konstantin Belousov
41106f5aa0 netipsec/xform_esp.c: make esp_ctr_compatibility global
Sponsored by:	NVIDIA networking
2024-07-12 06:29:31 +03:00
Konstantin Belousov
54ac7b969f ipsec: make key_do_allocsp() global
Sponsored by:	NVIDIA networking
2024-07-12 06:29:31 +03:00
Lexi Winter
50ecbc5142 libipsec: make const-correct
- add const to the appropriate places in the libipsec public API and the
  relevant internal functions needed to support that.

- replace caddr_t with c_caddr_t in ipsec_dump_policy()

- update the ipsec_dump_policy manpage to use c_caddr_t (this manpage
  was already wrong as it had "char *" instead of caddr_t previously).

While here, update pfkeyv2.h to not cast away const in the PFKEY_*()
macros.

This should not cause any ABI changes as the actual types have not
changed.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1099
2024-04-22 22:36:34 -06:00
Lexi Winter
122dd78c14 sys/netipsec: fix IPSEC_SUPPORT for non-INET kernels
The functions ipsec_kmod_udp_input() and ipsec_kmod_udp_pcbctl() are
used by netinet6 for IPSEC_SUPPORT, but are guarded behind #ifdef INET.

Since neither of these require INET, remove the guard so they're built
even without INET.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1158
2024-04-12 10:30:22 -06:00
Konstantin Belousov
1a56620b79 ipsec esp: avoid dereferencing freed secasindex
It is possible that SA was removed while processing packed, in which
case it is changed to the DEAD state and it index is removed from the
tree. Dereferencing sav->sah then touches freed memory.

Reviewed by:	ae
Sponsored by:	NVIDIA networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44079
2024-02-26 16:27:46 +02:00
Kristof Provost
c3d7bb5aca netipsec: fix LINT-NOINET build
udp_ipsec_input() is used for INET6, so we need it even in NOINET
builds. Build the relevant file if either of INET or INET6 are set.
2024-01-20 22:22:21 +01:00
Allan Jude
dc02374f54 Fix KASSERT in 80044c78 causing build failures
Move the KASSERT to where struct ip6_hdr is populated

Fixes:		80044c785c
Reported-by:	bapt
Reviewed-by:	markj
Sponsored-by:	Klara, Inc.
2024-01-16 23:15:00 +00:00
Xavier Beaudouin
80044c785c Add UDP encapsulation of ESP in IPv6
This patch provides UDP encapsulation of ESP packets over IPv6.
Ports the IPv4 code to IPv6 and adds support for IPv6 in udpencap.c
As required by the RFC and unlike in IPv4 encapsulation,
UDP checksums are calculated.

Co-authored-by:	Aurelien Cazuc <aurelien.cazuc.external@stormshield.eu>
Sponsored-by:	Stormshield
Sponsored-by:	Wiktel
Sponsored-by:	Klara, Inc.
2024-01-16 20:44:34 +00:00
Gleb Smirnoff
296a4cb5c5 sockets: provide correct pr_shutdown for keysock and SDP
My failure to run all kinds of kernel builds lead to missing the keysock
and incorrectly assuming SDP as not having a shutdown method.

Fixes:	5bba272807
2024-01-16 12:02:59 -08:00
Warner Losh
fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
71625ec9ad sys: Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:24 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Michael Tuexen
0fb0711dba tcp: fix TCP MD5 digest computation for TCP over UDP
Skip the UDP header for the computation. This is similar to
skipping IPv6 extension headers.

Reviewed by:		cc, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D40596
2023-06-21 22:48:12 +02:00
Mark Johnston
056305d3aa ipsec: Make algorithm tables read-only
No functional change intended.

MFC after:	1 week
2023-06-02 13:43:15 -04:00
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Konstantin Belousov
04d815f115 netipsec/key.c: use designated initializers for arrays
Also de-expand nitems() use in related asserts, and fix maxsize array
name in the assert message.

Sponsored by:	NVidia networking
2023-04-25 09:41:24 +03:00
Konstantin Belousov
fcc7aabdca netipsec: some style
Sponsored by:	NVidia networking
2023-04-25 09:39:51 +03:00
Mateusz Guzik
889a9acc54 ipsec: only update lastused when it changes
to limit cache-line bouncing.

Note that as there is no atomic_store we are hoping the compiler wont
speculatively do the store. It is not employed because the size depends
on target arch.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D38433
2023-02-16 07:33:51 +00:00
Justin Hibbits
3d0d5b21c9 IfAPI: Explicitly include <net/if_private.h> in netstack
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header.  <net/if_var.h> will stop including the
header in the future.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
2023-01-31 15:02:16 -05:00
Konstantin Belousov
424f1296bd ipsec.c: typos in the comment
Sponsored by:	NVIDIA Networking
MFC after:	3 days
2023-01-18 23:22:35 +02:00
Mark Johnston
8a9495517b ipsec: Clear pad bytes in PF_KEY messages
Various handlers for SADB messages will allocate a new mbuf and populate
some structures in it.  Some of these structures, such as struct
sadb_supported, contain small reserved fields that are not initialized
and are thus leaked to userspace.

Fix the problem by adding a helper to allocate zeroed mbufs.  This
reduces code duplication and the overhead of zeroing these messages
isn't harmful.

Reviewed by:	zlei, melifaro
Reported by:	KMSAN
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D38068
2023-01-16 11:27:54 -05:00
John Baldwin
b357d40f08 kdebug_secasv: Update for recent locking changes.
Reviewed by:	kp
Fixes:		0361f165f2 ipsec: replace SECASVAR  mtx by rmlock
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37690
2022-12-15 11:27:39 -08:00
Gleb Smirnoff
e68b379244 tcp: embed inpcb into tcpcb
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself.  With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa.  Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that.  However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort.  The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone.  Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision:	https://reviews.freebsd.org/D37127
2022-12-07 09:00:48 -08:00
Mateusz Guzik
c1bfe8c593 ipsec: add key_havesp_any
Saves on work in a common case of checking both directions.

Note further work in the area is impending to elide these in the common
case to begin with.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36485
2022-11-22 12:23:08 +00:00
Mateusz Guzik
86104d3ebb ipsec: prohibit unknown directions in key_havesp
Eliminates a branch checking for its validity.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36485
2022-11-22 12:23:03 +00:00
Kristof Provost
9f8f3a8e9a ipsec: add support for CHACHA20POLY1305
Based on a patch by ae@.

Reviewed by:	gbe (man page), pauamma (man page)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D37180
2022-11-02 14:19:04 +01:00
Gleb Smirnoff
53af690381 tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources.  After
0d7445193a, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions.  Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision:	https://reviews.freebsd.org/D36400
2022-10-06 19:24:37 -07:00
Hans Petter Selasky
9f69c0b87d Fix kernel build after fcb3f813f3 .
By updating function arguments for ipsec_kmod_ctlinput() which is used
when loading IPSEC support via kernel modules.

Differential Revision:	https://reviews.freebsd.org/D36731
Sponsored by:	NVIDIA Networking
2022-10-04 15:42:51 +02:00
Gleb Smirnoff
fcb3f813f3 netinet*: remove PRC_ constants and streamline ICMP processing
In the original design of the network stack from the protocol control
input method pr_ctlinput was used notify the protocols about two very
different kinds of events: internal system events and receival of an
ICMP messages from outside.  These events were coded with PRC_ codes.
Today these methods are removed from the protosw(9) and are isolated
to IPv4 and IPv6 stacks and are called only from icmp*_input().  The
PRC_ codes now just create a shim layer between ICMP codes and errors
or actions taken by protocols.

- Change ipproto_ctlinput_t to pass just pointer to ICMP header.  This
  allows protocols to not deduct it from the internal IP header.
- Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer.
  It has all the information needed to the protocols.  In the structure,
  change ip6c_finaldst fields to sockaddr_in6.  The reason is that
  icmp6_input() already has this address wrapped in sockaddr, and the
  protocols want this address as sockaddr.
- For UDP tunneling control input, as well as for IPSEC control input,
  change the prototypes to accept a transparent union of either ICMP
  header pointer or struct ip6ctlparam pointer.
- In icmp_input() and icmp6_input() do only validation of ICMP header and
  count bad packets.  The translation of ICMP codes to errors/actions is
  done by protocols.
- Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap,
  inet6ctlerrmap arrays.
- In protocol ctlinput methods either trust what icmp_errmap() recommend,
  or do our own logic based on the ICMP header.

Differential revision:	https://reviews.freebsd.org/D36731
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
809fef2913 netipsec: move specific ipsecmethods declarations to ipsec_support.h
where struct ipsec_methods is defined.  Not a functional change.
Allows further modification of method prototypes without breaking
compilation of other ipsec compilation units.

Differential revision:	https://reviews.freebsd.org/D36730
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
46ddeb6be8 netinet6: retire ip6protosw.h
The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with
the former removed in f0ffb944d2 in 2001 and the latter survived until
today.  It has been reduced down to only one useful declaration that
moves to ip6_var.h

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36726
2022-10-03 20:53:04 -07:00