Commit graph

5318 commits

Author SHA1 Message Date
Kristof Provost
271f146955 pf: vnet-ify pf_hashsize, pf_hashmask, pf_srchashsize and V_pf_srchashmask
These variables are tunables, so in principle they never change at runtime.
That would mean they don't need to be tracked per-vnet.

However, they both can be decreased (back to their default values) if the
memory allocations for their respective tables fail, and these allocations are
per-vnet. That is, it's possible for a few vnets to be started and have the
tuned size for the hash and srchash tables only to have later vnets fail the
initial allocation and fall back to smaller allocations. That would confuse
the previously created vnets (because their actual table size and size/mask
variables would no longer match).

Avoid this by turning these into per-vnet variables.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-07-17 16:00:49 +02:00
Kristof Provost
d909f06b90 pf: convert DIOCADDADDR to netlink
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-07-17 07:52:55 +02:00
Mark Johnston
ec1b18c735 route: Wrap long lines
No functional change intended.

MFC after:	1 week
Sponsored by:	Klara, Inc.
2024-07-14 14:29:15 -04:00
Konstantin Belousov
240b7bfe56 ipsec_offload: offload inner checksums calculations for UDP/TCP/TSO
and allow the interface driver to declare such support.

Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44221
2024-07-12 07:27:58 +03:00
Konstantin Belousov
2131654bde sys/net: Add IPSEC_OFFLOAD interface cap and methods structure
Reviewed by:	glebius
Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44314
2024-07-12 06:29:32 +03:00
Konstantin Belousov
b256ff9303 sys/pfkeyv2.h: define extensions for ipsec inline accel control
The extensions allow to restrict interface where SP or SA are offloaded,
and to receive software and hardware offload counters for given SA.

Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44316
2024-07-12 06:29:31 +03:00
Konstantin Belousov
00524fd475 ipsec_output(): add mtu argument
Similarly, mtu is needed to decide inline IPSEC offloiad for the driver.

Sponsored by: NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44224
2024-07-12 06:29:31 +03:00
Konstantin Belousov
de1da299da ipsec_output(): add outcoming ifp argument
The information about the interface is needed to coordinate inline
offloading of IPSEC processing with corresponding driver.

Sponsored by:	NVIDIA networking
Differential revision:	https://reviews.freebsd.org/D44223
2024-07-12 06:29:31 +03:00
Konstantin Belousov
7539b04ed7 ipsec_newpolicies(): do not call key_freesp() with NULL value
Sponsored by:	NVIDIA networking
MFC after:	1 week
2024-07-12 06:29:30 +03:00
Zhenlei Huang
09164454aa ethernet: Retire M_HASFCS
The mbuf flag M_HASFCS was introduced for drivers to indicate the net
stack that packets include FCS (Frame Check Sequence). In principle, to
be efficient, FCS should always be processed by hardware, firmware, or
at last sort the driver. Well, Ethernet specifies that damaged frames
should be discarded, thus only good ones will be passed up to the net
stack, then it makes no senses for the net stack to see FCS just to trim
it.

The last consumer of the flag M_HASFCS has been removed since change [1].
It is time to retire it.

1. 105a4f7b3c ng_atmllc: remove

Reviewed by:	kp
MFC after:	never
Differential Revision:	https://reviews.freebsd.org/D42391
2024-07-05 00:53:51 +08:00
Zhenlei Huang
a2cac544a6 if_clone: Allow maxunit to be zero
Some drivers, e.g. if_enc(4), only allow one instance to be created, but
the KPI ifc_attach_cloner() treat zero value of maxunit as not limited,
aka IF_MAXUNIT.

Introduce a new flag IFC_F_LIMITUNIT to indicate that the requested
maxunit is limited and should be respected.

Consumers should use the new flag if there is an intended limit.

Reviewed by:	glebius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D45757
2024-07-03 21:14:08 +08:00
Zhenlei Huang
087f5e08ab if_vxlan(4): Plug a memory leak
On clone creating, either failure from vxlan_set_user_config() or
ifc_copyin() will result in leaking previous allocated counters.

Since counter_u64_alloc(M_WAITOK) never fails, make vxlan_stats_alloc()
void and move the allocation for counters below checking ifd->params to
avoid memory leak.

Reviewed by:	kp, glebius
Fixes:	b092fd6c97 if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45822
2024-07-02 12:57:02 +08:00
Zhenlei Huang
d6963b9ed3 if_vxlan(4): Exclude ETHER_CRC_LEN from macro VXLAN_MAX_MTU
The encapsulated (original) frame does not count in FCS as per Section 5
of RFC 7348.

Reviewed by:	afedorov, bryanv, #network
Fixes:		b7592822d5 Allow set MTU more than 1500 bytes
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45195
2024-07-02 12:57:01 +08:00
Zhenlei Huang
9738277b5c ifnet: Remove dead code
Since change [1], if_bpf will not be detached by the interface departure
eventhandler and will not be NULL. Then the logic to re-attach if_bpf
becomes dead and serves no purpose any more.

This partially reverts commit 05fc416403.

1. 9ce40d321d bpf: Fix incorrect cleanup

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45599
2024-06-30 17:44:21 +08:00
Zhenlei Huang
aa3860851b net: Remove unneeded NULL check for the allocated ifnet
Change 4787572d05 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().

No functional change intended.

Reviewed by:	kp, imp, glebius, stevek
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D45740
2024-06-28 18:16:29 +08:00
Zhenlei Huang
ef4f4a44d9 ifnet: Restore curvnet earlier
This improves readability a little. As a side effect, a redundant
CURVNET_RESTORE is removed.

No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45595
2024-06-27 12:38:04 +08:00
Zhenlei Huang
2cb7605a24 lo: Use new KPI to create the first loop interface
While here remove a pointless static local variable lo_cloner.

No functional change intended.

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45728
2024-06-26 18:00:37 +08:00
Mark Johnston
02cbf9ebf1 lagg: Fix a teardown race
When a lagg interface is destroyed, it destroys all of the lagg ports,
which triggers an asynchronous link state change handler.  This in turn
may generate a netlink message, a portion of which requires netlink to
invoke the SIOCGIFMEDIA ioctl of the lagg interface, which involves
scanning the list of interface media.  This list is not internally
locked, it requires the interface driver to provide some kind of
synchronization.

Shortly after the link state notification has been raised, the lagg
interface detaches itself from the network stack.  As a part of this, it
blocks in order to wait for link state handlers to drain, but before
that it destroys the interface media list.  Reverse this order of
operations so that the link state change handlers drain first, avoiding
a use-after-free that is very occasionally triggered by lagg stress
tests.  This matches other ethernet drivers in the tree.

MFC after:	2 weeks
2024-06-24 10:47:29 -04:00
Mark Johnston
66b8cac8d8 pf: Sprinkle const qualifiers in state lookup routines
State keys are trivially const in lookup routines, so annotate them as
such.  No functional change intended.

Reviewed by:	kp
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum
Differential Revision:	https://reviews.freebsd.org/D45671
2024-06-24 10:46:55 -04:00
Zhenlei Huang
71f8fbf9bd ifnet: Use NET_EPOCH_WAIT() macro
This makes it easier to grep the usage.

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45715
2024-06-24 17:57:14 +08:00
Mateusz Guzik
b6196537b0 pf: fix the "keepcounters" to stop truncating to 32-bit
The machinery to support 64-bit counters even on 32-bit kernels had a
bug where it would unitentionally truncate the value back to 32-bits
when transferring to a new counter. This resulted in buggy be behavior
on 64-bit kernels as well.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-06-20 17:55:43 +00:00
Kristof Provost
ba2a920786 pf: convert DIOCBEGINADDRS to netlink 2024-06-08 04:46:43 +02:00
Kristof Provost
d9ab899931 pf: migrate DIOCGETLIMIT/DIOCSETLIMIT to netlink
Event:		Kitchener-Waterloo Hackathon 202406
2024-06-07 20:59:02 +02:00
Zhenlei Huang
0dfd11abc4 bpf: Make bpf_peers_present a boolean inline function
This function was introduced in commit [1] and is actually used as a
boolean function although it was not defined as so.

No functional change intended.

1. 16d878cc99 Fix the following bpf(4) race condition which can result in a panic

Reviewed by:	markj, kp, #network
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45509
2024-06-07 23:06:08 +08:00
Zhenlei Huang
89204d9dcb bpf: Prefer the boolean form when calling bpf_peers_present()
No functional change intended.

Reviewed by:	markj, kp, #network
MFC with:	8f31b879ec
Differential Revision:	https://reviews.freebsd.org/D45509
2024-06-07 23:06:07 +08:00
Kristof Provost
30bad751e8 pf: convert DIOCGETTIMEOUT/DIOCSETTIMEOUT to netlink 2024-06-06 20:46:18 +02:00
Zhenlei Huang
215a18d502 if_enc(4): Prefer the boolean form when calling bpf_peers_present()
No functional change intended.

MFC after:	1 week
2024-06-06 12:20:26 +08:00
Kristof Provost
9dbbe68bc5 pf: convert DIOCCLRSTATUS to netlink
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-06-04 14:59:58 +02:00
Kristof Provost
6ee3e37682 pf: fix incorrect anchor_call to userspace
777a4702c changed how we copy out the anchor_call string, and
incorrectly limited it to 8 (4 on 32-bit systems) bytes. Fix that so we
get the full anchor path, rather than just the first few characters.

PR:		279225
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-05-28 22:27:22 +02:00
Kristof Provost
bdd12889ea if_vlan: handle VID conflicts
If we fail to change the vlan id we have to undo the removal (and vlan id
change) in the error path. Otherwise we'll have removed the vlan object from the
hash table, and have the wrong vlan id as well. Subsequent modification attempts
will then try to remove an entry which doesn't exist, and panic.

Undo the vlan id modification if the insertion in the hash table fails, and
re-insert it under the original vlan id.

PR:		279195
Reviewed by:	zlei
MFC atfer:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D45285
2024-05-22 09:08:02 +02:00
Zhenlei Huang
93fbfef0b5 if_vxlan(4): Add checking for loops and nesting of tunnels
User misconfiguration, either tunnel loops, or a large number of
different nested tunnels, can overflow the kernel stack. Prevent that
by using if_tunnel_check_nesting().

PR:		278394
Diagnosed by:	markj
Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45197
2024-05-20 20:14:07 +08:00
Kristof Provost
59a6666ec9 if_ovpn: cope with loops
User misconfiguration may lead to routing loops where we try to send the tunnel
packet into the tunnel. This eventually leads to stack overflows and panics.

Avoid this using if_tunnel_check_nesting(), which will drop the packet if we're
looping or we hit three layers of nested tunnels.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-05-13 12:11:06 +02:00
Gleb Smirnoff
fadbb6f85a lagg: remove use of net epoch in the ioctl paths
Rely on LAGG_SLOCK() instead.  The use of network epoch(9) here was added
in 6573d7580b (later tidied by 87bf9b9cbe) as a large sweep that
blindly substituted blocking kernel primitives with epoch(9).  In these
particular code paths use of epoch(9) is incorrect and doesn't provide any
protection against a stale pointer.  Recent fix 48698ead6f, which should
actually have removed the epoch use, created a potential sleeping in epoch
problem.
2024-05-06 15:27:32 -07:00
Gleb Smirnoff
570685971c lagg: propagate up/down to the children
Based on the old submission from asomers@.  With modern state of locking
in lagg(4), the patch got much simplier.  Enable the test that was
waiting for this change.

PR:			226144
Reviewed by:		asomers
Differential Revision:	https://reviews.freebsd.org/D44605
2024-05-06 15:27:32 -07:00
Kristof Provost
43387b4e57 if: guard against if_ioctl being NULL
There are situations where an struct ifnet has a NULL if_ioctl pointer.

For example, e6000sw creates such struct ifnets for each of its ports so it can
call into the MII code.

If there is then a link state event this calls do_link_state_change()
-> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() ->
get_operstate_ether(). That wants to know if the link is up or down, so it tries
to ioctl(SIOCGIFMEDIA), which doesn't go well if if_ioctl is NULL.

Guard against this, and return EOPNOTSUPP.

PR:		275920
MFC ater:	3 days
Sponsored by:   Rubicon Communications, LLC ("Netgate")
2024-05-06 11:39:08 +02:00
Zhenlei Huang
73585176ff if_bridge: Minor style fixes
And more comments on the #ifdef INET blocks to improve readability.

While here, revert the order of two prototypes to produce minimal diff
compared to stable branches.

MFC with:	65767e6126
2024-04-26 02:19:11 +08:00
Lexi Winter
65767e6126 sys/net/if_bridge: support non-INET kernels
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1159
2024-04-23 15:13:00 -06:00
Denny Page
fcdf9a1989 Support ARP for 802 networks
This is used by 802.3 Ethernet.  (Also be used by 802.4 Token Bus and
802.5 Token Ring, but we don't support those.)

This was accidentally removed along with FDDI support in commit
0437c8e3b1, presumably because comments implied it was used only by
FDDI or Token Ring.

Fixes: 0437c8e3b1 ("Remove support for FDDI networks.")
Reviewed-by: emaste
Signed-off-by: Denny Page <dennypage@me.com>
Pull-request: https://github.com/freebsd/freebsd-src/pull/1166
2024-04-23 12:30:53 -04:00
Lexi Winter
50ecbc5142 libipsec: make const-correct
- add const to the appropriate places in the libipsec public API and the
  relevant internal functions needed to support that.

- replace caddr_t with c_caddr_t in ipsec_dump_policy()

- update the ipsec_dump_policy manpage to use c_caddr_t (this manpage
  was already wrong as it had "char *" instead of caddr_t previously).

While here, update pfkeyv2.h to not cast away const in the PFKEY_*()
macros.

This should not cause any ABI changes as the actual types have not
changed.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1099
2024-04-22 22:36:34 -06:00
Lexi Winter
ef84dd8f49 if_bridge: clean up INET/INET6 handling
The if_bridge contains several instances of:

	if (AF_INET code ...
	#ifdef INET6
	    AF_INET6 code ...
	#endif
	) {
		...

Clean this up by adding a couple of macros at the top of the file that
are conditionally defined based on whether INET and/or INET6 are enabled,
which makes the code more readable and easier to maintain.

No functional change intended.

Reviewed by:	zlei, markj
MFC after:	1 week
Pull Request:	https://github.com/freebsd/freebsd-src/pull/1191
2024-04-22 12:01:27 -04:00
Seth Hoffert
2cb0fce24d bpf: Make BPF interop consistent with if_loop
The pseudo_AF_HDRCMPLT check is already being done in if_loop and
just needed to be ported over to if_ic, if_wg, if_disc, if_gif,
if_gre, if_me, if_tuntap and ng_iface.  This is needed in order to
allow these interfaces to work properly with e.g., tcpreplay.

PR:		256587
Reviewed by:	markj
MFC after:	2 weeks
Pull Request:	https://github.com/freebsd/freebsd-src/pull/876
2024-04-19 14:48:37 -04:00
Eric Joyner
ed34a6b6ea
iflib: Add subinterface interrupt allocation function
The ice(4) driver will add the ability to create extra interfaces
that hang off of the base interface; to do that the driver requires
a method for the subinterface to request hardware interrupt resources
from the base interface.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D39930
2024-04-18 16:14:02 -07:00
Eric Joyner
3c7da27a47
iflib: Add sysctl to request extra MSIX vectors on driver load
Intended to be used with upcoming feature to add sub-interfaces, since
those new interfaces will be dynamically created and will need to have
spare MSI-X interrupts already allocated for them on driver load.

This sysctl is marked as a tunable since it will need to be set before
the driver is loaded since MSI-X interrupt allocation and setup is
done during the attach process.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D41326
2024-04-18 16:13:47 -07:00
Stephen J. Kiernan
e4a0c92e7a iflib: Correct indentation according to style(9)
The indentation style for the SYSCTL_* macros used was not matching KNF.

Reported by:	jhb
Differential Revision:	https://reviews.freebsd.org/D44811
2024-04-16 16:36:25 -04:00
Stephen J. Kiernan
303dea74c2 iflib: Fix compiler warnings
Some of the QUAD sysctls are actually for unsigned quad values.
Switch to using UQUAD instead, as that is meant for unsigned.

Reviewed by:	erj, jhb
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D44620
2024-04-15 10:57:52 -04:00
Zhenlei Huang
6fe4d8395b debugnet: Fix logging of frame length
MFC after:	1 week
2024-04-09 00:47:10 +08:00
Zhenlei Huang
e7102929bf ethernet: Fix logging of frame length
Both the mbuf length and the total packet length are signed.

While here, update a stall comment to reflect the current practice.

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42390
2024-04-09 00:44:33 +08:00
Eugene Grosbein
319a5d086b if_bridge: use IF_MINMTU
Replace incorrect constant 576 with IF_MINMTU to check for minumum MTU.
This unbreaks bridging tap interfaces with small mtu.

MFC after:	1 week
2024-04-01 10:35:59 +07:00
Gleb Smirnoff
fa93ba4097 if_tuntap: simplify storage of per-vnet cloners
There is no need for a separate structure neither for a linked list.
Provide each VNET with an array of pointers to if_clone that has the same
size as the driver list.

Reviewed by:		zlei, kevans, kp
Differential Revision:	https://reviews.freebsd.org/D44307
2024-03-29 12:35:41 -07:00
Gleb Smirnoff
2497c70f81 vnet: remove unneeded backslash
Fixes:	430e0e409c
2024-03-15 12:17:04 -07:00