Commit graph

5287 commits

Author SHA1 Message Date
Kristof Provost
59a6666ec9 if_ovpn: cope with loops
User misconfiguration may lead to routing loops where we try to send the tunnel
packet into the tunnel. This eventually leads to stack overflows and panics.

Avoid this using if_tunnel_check_nesting(), which will drop the packet if we're
looping or we hit three layers of nested tunnels.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-05-13 12:11:06 +02:00
Gleb Smirnoff
fadbb6f85a lagg: remove use of net epoch in the ioctl paths
Rely on LAGG_SLOCK() instead.  The use of network epoch(9) here was added
in 6573d7580b (later tidied by 87bf9b9cbe) as a large sweep that
blindly substituted blocking kernel primitives with epoch(9).  In these
particular code paths use of epoch(9) is incorrect and doesn't provide any
protection against a stale pointer.  Recent fix 48698ead6f, which should
actually have removed the epoch use, created a potential sleeping in epoch
problem.
2024-05-06 15:27:32 -07:00
Gleb Smirnoff
570685971c lagg: propagate up/down to the children
Based on the old submission from asomers@.  With modern state of locking
in lagg(4), the patch got much simplier.  Enable the test that was
waiting for this change.

PR:			226144
Reviewed by:		asomers
Differential Revision:	https://reviews.freebsd.org/D44605
2024-05-06 15:27:32 -07:00
Kristof Provost
43387b4e57 if: guard against if_ioctl being NULL
There are situations where an struct ifnet has a NULL if_ioctl pointer.

For example, e6000sw creates such struct ifnets for each of its ports so it can
call into the MII code.

If there is then a link state event this calls do_link_state_change()
-> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() ->
get_operstate_ether(). That wants to know if the link is up or down, so it tries
to ioctl(SIOCGIFMEDIA), which doesn't go well if if_ioctl is NULL.

Guard against this, and return EOPNOTSUPP.

PR:		275920
MFC ater:	3 days
Sponsored by:   Rubicon Communications, LLC ("Netgate")
2024-05-06 11:39:08 +02:00
Zhenlei Huang
73585176ff if_bridge: Minor style fixes
And more comments on the #ifdef INET blocks to improve readability.

While here, revert the order of two prototypes to produce minimal diff
compared to stable branches.

MFC with:	65767e6126
2024-04-26 02:19:11 +08:00
Lexi Winter
65767e6126 sys/net/if_bridge: support non-INET kernels
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1159
2024-04-23 15:13:00 -06:00
Denny Page
fcdf9a1989 Support ARP for 802 networks
This is used by 802.3 Ethernet.  (Also be used by 802.4 Token Bus and
802.5 Token Ring, but we don't support those.)

This was accidentally removed along with FDDI support in commit
0437c8e3b1, presumably because comments implied it was used only by
FDDI or Token Ring.

Fixes: 0437c8e3b1 ("Remove support for FDDI networks.")
Reviewed-by: emaste
Signed-off-by: Denny Page <dennypage@me.com>
Pull-request: https://github.com/freebsd/freebsd-src/pull/1166
2024-04-23 12:30:53 -04:00
Lexi Winter
50ecbc5142 libipsec: make const-correct
- add const to the appropriate places in the libipsec public API and the
  relevant internal functions needed to support that.

- replace caddr_t with c_caddr_t in ipsec_dump_policy()

- update the ipsec_dump_policy manpage to use c_caddr_t (this manpage
  was already wrong as it had "char *" instead of caddr_t previously).

While here, update pfkeyv2.h to not cast away const in the PFKEY_*()
macros.

This should not cause any ABI changes as the actual types have not
changed.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1099
2024-04-22 22:36:34 -06:00
Lexi Winter
ef84dd8f49 if_bridge: clean up INET/INET6 handling
The if_bridge contains several instances of:

	if (AF_INET code ...
	#ifdef INET6
	    AF_INET6 code ...
	#endif
	) {
		...

Clean this up by adding a couple of macros at the top of the file that
are conditionally defined based on whether INET and/or INET6 are enabled,
which makes the code more readable and easier to maintain.

No functional change intended.

Reviewed by:	zlei, markj
MFC after:	1 week
Pull Request:	https://github.com/freebsd/freebsd-src/pull/1191
2024-04-22 12:01:27 -04:00
Seth Hoffert
2cb0fce24d bpf: Make BPF interop consistent with if_loop
The pseudo_AF_HDRCMPLT check is already being done in if_loop and
just needed to be ported over to if_ic, if_wg, if_disc, if_gif,
if_gre, if_me, if_tuntap and ng_iface.  This is needed in order to
allow these interfaces to work properly with e.g., tcpreplay.

PR:		256587
Reviewed by:	markj
MFC after:	2 weeks
Pull Request:	https://github.com/freebsd/freebsd-src/pull/876
2024-04-19 14:48:37 -04:00
Eric Joyner
ed34a6b6ea
iflib: Add subinterface interrupt allocation function
The ice(4) driver will add the ability to create extra interfaces
that hang off of the base interface; to do that the driver requires
a method for the subinterface to request hardware interrupt resources
from the base interface.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D39930
2024-04-18 16:14:02 -07:00
Eric Joyner
3c7da27a47
iflib: Add sysctl to request extra MSIX vectors on driver load
Intended to be used with upcoming feature to add sub-interfaces, since
those new interfaces will be dynamically created and will need to have
spare MSI-X interrupts already allocated for them on driver load.

This sysctl is marked as a tunable since it will need to be set before
the driver is loaded since MSI-X interrupt allocation and setup is
done during the attach process.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D41326
2024-04-18 16:13:47 -07:00
Stephen J. Kiernan
e4a0c92e7a iflib: Correct indentation according to style(9)
The indentation style for the SYSCTL_* macros used was not matching KNF.

Reported by:	jhb
Differential Revision:	https://reviews.freebsd.org/D44811
2024-04-16 16:36:25 -04:00
Stephen J. Kiernan
303dea74c2 iflib: Fix compiler warnings
Some of the QUAD sysctls are actually for unsigned quad values.
Switch to using UQUAD instead, as that is meant for unsigned.

Reviewed by:	erj, jhb
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D44620
2024-04-15 10:57:52 -04:00
Zhenlei Huang
6fe4d8395b debugnet: Fix logging of frame length
MFC after:	1 week
2024-04-09 00:47:10 +08:00
Zhenlei Huang
e7102929bf ethernet: Fix logging of frame length
Both the mbuf length and the total packet length are signed.

While here, update a stall comment to reflect the current practice.

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42390
2024-04-09 00:44:33 +08:00
Eugene Grosbein
319a5d086b if_bridge: use IF_MINMTU
Replace incorrect constant 576 with IF_MINMTU to check for minumum MTU.
This unbreaks bridging tap interfaces with small mtu.

MFC after:	1 week
2024-04-01 10:35:59 +07:00
Gleb Smirnoff
fa93ba4097 if_tuntap: simplify storage of per-vnet cloners
There is no need for a separate structure neither for a linked list.
Provide each VNET with an array of pointers to if_clone that has the same
size as the driver list.

Reviewed by:		zlei, kevans, kp
Differential Revision:	https://reviews.freebsd.org/D44307
2024-03-29 12:35:41 -07:00
Gleb Smirnoff
2497c70f81 vnet: remove unneeded backslash
Fixes:	430e0e409c
2024-03-15 12:17:04 -07:00
Kristof Provost
706d465dae pf: convert kill/clear state to use netlink
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D44090
2024-02-28 23:26:18 +01:00
Gleb Smirnoff
48698ead6f lagg: wrap lagg_port2req() into LAGG_SLOCK()
Although a port addition is coded in a sequence where first all softc
information is fulfilled and only then it is attached to the lagg, we
still need a locking primitive to guarantee cache invalidation.  Panic
observed in the wild shows that lacp_portreq() called via
lagg_port_ioctl(SIOCGLAGGPORT) immediately after port creation may see
lp->lp_psc as NULL and panic.  In the core file we will see valid data
all around.  A race via lagg_ioctl() wasn't observed but potentially
is possible.

Differential Revision:	https://reviews.freebsd.org/D43501
2024-02-23 17:56:46 -08:00
Tai-hwa Liang
25a5bb7318 net: bandaid for plugging a fw_com leak in fwip_detach()
Adding a temporary workaround for plugging a fw_com upon if_fwip unloading.

Steps to reproduce(needs two hosts connected with firewire):

  while true; do
    ifconfig fwip0 10.0.0.5 up
    fwcontrol -r
    ping -c 10.0.0.3
    kldunload if_fwip
  done

There's a chance that the unloading of if_fwip.ko triggers following warning:

	Warning: memory type fw_com leaked memory on destroy (1 allocations, 64 bytes leaked).

commit d79b6b8ec2 (origin/main, origin/HEAD)
2024-02-15 01:00:49 +00:00
Marius Strobl
ed81a15517 fib_algo(4): Lower level of algorithm switching messages to LOG_INFO
Otherwise, with the default flm_debug_level of LOG_NOTICE, it's rather
easy to trigger debug messages such as:
[fib_algo] inet.0 (bsearch4#18) rebuild_fd_flm: switching algo to
radix4_lockless

Also, the "severity" of these events generally only justifies LOG_INFO
and not LOG_NOTICE.

Reviewed by:	melifaro
2024-02-05 23:44:38 +01:00
Gleb Smirnoff
ce69e37369 Revert "sockets: retire sorflush()"
Provide a comment in sorflush() why the socket I/O sx(9) lock is actually
important.

This reverts commit 507f87a799.
2024-02-03 13:08:41 -08:00
Kristof Provost
777a4702c5 pf: implement addrule via netlink
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-02-02 17:55:16 +01:00
Kristof Provost
ffeab76b68 pfil: PFIL_PASS never frees the mbuf
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).

If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).

This allows us to remove a few extraneous NULL checks.

Suggested by:	tuexen
Reviewed by:	tuexen, zlei
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43617
2024-01-29 14:10:19 +01:00
Kristof Provost
e95025ed93 pflow: show socket status in verbose mode
Introduce a verbose output mode to pflowctl, and expose the status of
the socket to userspace. This can be helpful in debugging configuration
errors.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-01-25 17:37:51 +01:00
Gordon Bergling
ab6d773dbf rtsock: Fix a typo in a source code comment
- s/adddress/address/

MFC after:	3 days
2024-01-22 21:53:21 +01:00
Kristof Provost
63a5fe8343 pflow: limit to no more than 128 flow exporters
While there are no inherent limits to the number of exporters we're
likely to scale rather badly to very large numbers. There's also no
obvious use case for more than a handful. Limit to 128 exporters to
prevent foot-shooting.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-01-22 18:02:10 +01:00
Kristof Provost
54c62e3e5d pf: work around icmp6 packet-too-big not being sent when binat-ing
If we're applying NPTv6 we pass a packet with a modified source and/or
destination address to the network stack.

If that packet then turns out to be larger than the MTU of the sending
interface the stack will attempt to generate an icmp6 packet-too-big
error, but may fail to look up the appropriate source address for that
error message. Even if it does, pf would still have to undo the binat
operation inside the icmp6 packet so the sending host can make sense of
the error.

We can avoid both problems entirely by having pf also perform the MTU
check (taking the potential refragmentation into account), and
generating the icmp6 error directly in pf.

See also:	https://redmine.pfsense.org/issues/14290
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43499
2024-01-22 12:52:14 +01:00
Gordon Bergling
b4c94968d1 if_llatbl: Fix a typo in a KASSERT message
- s/entires/entries/

MFC after:	5 days
2024-01-20 21:00:22 +01:00
Gordon Bergling
a2fcd3af5c net: Fix two typos in source code comments
- s/strucutres/structures/

MFC after:	3 days
2024-01-20 17:28:12 +01:00
Gleb Smirnoff
507f87a799 sockets: retire sorflush()
With removal of dom_dispose method the function boils down to two
meaningful function calls: socantrcvmore() and sbrelease().  The latter is
only relevant for protocols that use generic socket buffers.

The socket I/O sx(9) lock acquisition in sorflush() is not relevant for
shutdown(2) operation as it doesn't do any I/O that may interleave with
read(2) or write(2).  The socket buffer mutex acquisition inside
sbrelease() is what guarantees thread safety.  This sx(9) acquisition in
soshutdown() can be tracked down to 4.4BSD times, where it used to be
sblock(), and it was carried over through the years evolving together with
sockets with no reconsideration of why do we carry it over.  I can't tell
if that sblock() made sense back then, but it doesn't make any today.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D43415
2024-01-16 10:30:49 -08:00
Gleb Smirnoff
5bba272807 sockets: make pr_shutdown fully protocol specific method
Disassemble a one-for-all soshutdown() into protocol specific methods.
This creates a small amount of copy & paste, but makes code a lot more
self documented, as protocol specific method would execute only the code
that is relevant to that protocol and nothing else.  This also fixes a
couple recent regressions and reduces risk of future regressions.  The
extended KPI for the new pr_shutdown removes need for the extra pr_flush
which was added for the sake of SCTP which could not perform its shutdown
properly with the old one.  Particularly for SCTP this change streamlines
a lot of code.

Some notes on why certain parts of code were copied or were not to certain
protocols:
* The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is
  needed only for those protocols that may be connected or disconnected.
* The above reduces into only SS_ISCONNECTED for those protocols that
  always connect instantly.
* The ENOTCONN and continue processing hack is left only for datagram
  protocols.
* The SOLISTENING(so) block is copied to those protocols that listen(2).
* sorflush() on SHUT_RD is copied almost to every protocol, but that
  will be refactored later.
* wakeup(&so->so_timeo) is copied to protocols that can make a non-instant
  connect(2), can SO_LINGER or can accept(2).

There are three protocols (netgraph(4), Bluetooth, SDP) that did not have
pr_shutdown, but old soshutdown() would still perform sorflush() on
SHUT_RD for them and also wakeup(9).  Those protocols partially supported
shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost
shutdown(2) support.  I'm pretty sure netgraph(4) and Bluetooth are okay
about that and SDP is almost abandoned anyway.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D43413
2024-01-16 10:30:37 -08:00
Kristof Provost
2be6f75707 pflow: Turn `pflowstats' statistics counters into per-CPU counters to make them mpsafe.
The weird interactions around `pflow_flows' and `sc_gcounter' replaced
by simple `pflow_flows' increment. Since the flow sequence is the 32
bits integer, the `sc_gcounter' type replaced by the type of uint32_t.

Obtained from:	OpenBSD
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43116
2024-01-16 09:45:55 +01:00
Kristof Provost
fc6e506996 pflow: add RFC8158 NAT support
Extend pflow(4) to send NAT44 Session Create and Delete events.
This applies only to IPFIX (i.e. proto version 10), and requires no
user configuration.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43114
2024-01-16 09:45:55 +01:00
Kristof Provost
85b71dcfc9 pflow: allow observation domain to be configured
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43113
2024-01-16 09:45:54 +01:00
Kristof Provost
0493260115 pf: store state creation/expiration timestamps with milisecond precision
The primary beneficiary is pflow(4), which expects milisecond precision
in timestamps.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43112
2024-01-16 09:45:54 +01:00
Kristof Provost
baf9b6d042 pf: allow pflow to be activated per rule
Only generate ipfix/netflow reports (through pflow) for the rules where
this is enabled. Reports can also be enabled globally through 'set
state-default pflow'.

Obtained from:	OpenBSD
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43108
2024-01-16 09:45:54 +01:00
Kristof Provost
f92d9b1aad pflow: import from OpenBSD
pflow is a pseudo device to export flow accounting data over UDP.
It's compatible with netflow version 5 and IPFIX (10).

The data is extracted from the pf state table. States are exported once
they are removed.

Reviewed by:	melifaro
Obtained from:	OpenBSD
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D43106
2024-01-16 09:45:53 +01:00
John Baldwin
8cb9b68f58 sys: Use mbufq_empty instead of comparing mbufq_len against 0
Reviewed by:	bz, emaste
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43338
2024-01-09 11:00:46 -08:00
Kristof Provost
948e8413ab pflog: pass the action to pflog directly
If a packet is malformed, it is dropped by pf(4).  The rule referenced
in pflog(4) is the default rule.  As the default rule is a pass
rule, tcpdump printed "pass" although the packet was actually
dropped. Use the actual action, rather than the rule's action, or an
attempt at guessing the correct action.

Inspired by OpenBSD's 'pflog(4) logs packet dropped by default rule with block.' commit.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-01-04 23:08:08 +01:00
Jose Luis Duran
3e8d829747 net: Fix typo (triple S)
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/955
2023-12-27 20:24:32 -07:00
Gleb Smirnoff
c1c55da49f pfil: don't leak pfil_head_t on interface detach
PR:		256714
Submitted by:	jcaplan@blackberry.com
2023-12-21 10:53:49 -08:00
Konstantin Belousov
0365e5fc90 if_tun: check device name
to avoid panic if the name already exists, which is possible with the
interface renaming.

PR:	266999
Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D43001
2023-12-12 06:02:11 +02:00
Kristof Provost
bd7b2f9501 vnet: (read) lock the vnet list while iterating it
Ensure that the vnet list cannot be modified while we're running through
it.

Reviewed by:	mjg (previous version), zlei (previous version)
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D42927
2023-12-07 13:34:47 +01:00
Ronald Klop
3878bbf1bb Teach if_smsc to get MAC from bootargs.
Some Raspberry Pi pass smsc95xx.macaddr=XX:XX:XX:XX:XX:XX as bootargs.
Use this if no ethernet address is found in an EEPROM.
As last resort fall back to ether_gen_addr() instead of random MAC.

PR:	274092
Reported by:	Patrick M. Hausen (via ML)
Reviewed by:	imp, karels, zlei
Tested by:	Patrick M. Hausen
Approved by:	karels
MFC after:	1 month
Relnotes:	yes
Differential Revision: https://reviews.freebsd.org/D42463
2023-12-07 12:32:01 +01:00
Gleb Smirnoff
5b0010b467 if_tuntap: fix NOIP build
Note: this removes one TUNDEBUG() for the sake of not having one more
ifdefed variable declaration and for the overall code brevity.  The call
from tuntap into LRO can be so easily traced with dtrace(1) that an
80-ish printf(9)-based debugging can be omitted.

Fixes:	99c79cab42
2023-12-04 10:18:56 -08:00
Gleb Smirnoff
0fac350c54 sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912, use same approach
for two simplier syscalls that return socket addresses.  Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.

Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.

Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D42694
2023-11-30 08:31:10 -08:00
Kristof Provost
44f323ecde pf: implement DIOCGETRULES via netlink
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-11-27 21:36:49 +01:00