There is no reason for this fallback to be conditional on COMPAT_FREEBSD12.
PR: 273539
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: NetApp, Inc.
Reviewed by: melifaro, allanjude
Differential Revision: https://reviews.freebsd.org/D41717
(cherry picked from commit b451dcc84f1cfd1d14ede8a53d1d8359c9b85c94)
Approved by: re (gjb)
For oubound traffic, the flag M_VLANTAG is set in mbuf packet header to
indicate the underlaying interface do hardware VLAN tag insertion if
capable, otherwise the net stack will do 802.1Q encapsulation instead.
Commit 868aabb470 introduced per-flow priority which set the priority ID
in the mbuf packet header. There's a corner case that when the driver is
disabled to do hardware VLAN tag insertion, and the net stack do 802.1Q
encapsulation, then it will result double tagged packets if the driver do
not check the enabled capability (hardware VLAN tag insertion).
Unfortunately some drivers, currently known cxgbe(4) re(4) ure(4) igc(4)
and vmx(4), have this issue. From a quick review for other interface
drivers I believe a lot more drivers have the same issue. It makes more
sense to fix in net stack than to try to change every single driver.
PR: 270736
Reviewed by: kp
Approved by: re (gjb)
Fixes: 868aabb470 Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39499
(cherry picked from commit b22aae410bc7e4e9a6b43e556dc34be72deadb65)
In rS360398, a new iflib device method was added to opt out of VLAN
events needing an interface reset.
I am switching the default to not requiring a restart for:
* VLAN events
* unknown events
After fixing various bugs, I do not think this would be a common need
of hardware and it is undesirable from the user's perspective causing
link flaps and much slower VLAN configuration. Currently, there are no
other restart events besides VLAN events, and setting the
ifdi_needs_restart default to false will alleviate the need to churn
every driver if an odd event is added in the future for specific
hardware.
markj points out this could cause churn in the other direction; I will
solve that problem with an event registration system as he mentions in
the review should we need it in the future.
These drivers will opt into restart and need further inspection or work:
* ixv (needs code audit, 61a8231 fixed principal issue; re-init probably
not necessary)
* axgbe (needs code audit; re-init probably not necessary)
* iavf - (needs code audit; interaction with Malicious Driver Detection
mentioned in rS360398)
* mgb - no VLAN functions are currently implemented. Left a comment.
MFC after: 2 weeks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D41558
pf_route() sends traffic to a specified next hop over a specific
interface. The next hop is obtained in pf_map_addr() but the interface
is obtained directly via r->rpool.cur->kif` outside of the lock held in
pf_map_addr() in multiple places around pf. The chosen interface is not
stored in source node.
Move the interface selection into pf_map_addr(), have the function
return it together with the chosen IP address and ensure its stored
in struct pf_ksrc_node, store it in the source node and use the stored
value when needed.
Sponsored by: InnoGames GmbH
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41570
In commit c7cffd65c5 the function ether_8021q_frame() was slightly
refactored to use pointer of struct ether_8021q_tag as parameter qtag to
include the new option proto.
It is wrong to write to qtag->pcp as it will effectively change the memory
that qtag points to. Unfortunately the transmit routine of if_vlan parses
pointer of the member ifv_qtag of its softc which stores vlan interface's
PCP internally, when transmitting mbufs that contains PCP the vlan
interface's PCP will get overwritten.
Fix by operating on a local copy of qtag->pcp. Also mark 'struct ether_8021q_tag'
as const so that compilers can pick up such kind of bug.
PR: 273304
Reviewed by: kp
Fixes: c7cffd65c5 Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39505
When we receive a packet and remove the encapsulating layer we should
also clear out protocol flags and any mbuf tags.
If we do not we risk confusing firewalls filtering the tunneled packet.
See also: https://redmine.pfsense.org/issues/14682#change-69073
Sponsored by: Rubicon Communications, LLC ("Netgate")
vlan_capabilities(), used by the IFCAP ioctl, was not respecting the
IFCAP_LRO bit if it was masked by the requestor.
This prevented if_bridge(4) from automasking LRO with a message like:
bridge0: can't disable some capabilities on em3.11: 0x400
This also prevented manually disabling LRO from any vlan interface.
PR: 254596
Reported by: Paul Vixie <paul@redbarn.org>
MFC after: 1 week
This reverts commit 5f11a33cee.
As requested by Kevin Bowling. He explains:
> The subtle bug was that vlan_capabilities() in if_vlan was not obeying
> the requested mask from its IFCAP ioctl.
If the parent interface is not a bridge and can do LRO and
checksum offloading on VLANs, then guess it may do LRO on VLANs.
False positive here cost nothing, while false negative may lead
to some confusions. According to Wikipedia:
"LRO should not operate on machines acting as routers, as it breaks
the end-to-end principle and can significantly impact performance."
The same reasoning applies to machines acting as bridges.
PR: 254596
MFC after: 3 weeks
In iflib_init_locked(), sctx and scctx both point to the same value,
which is the ifc_softc_ctx field in the iflib softc. Remove the
declaration and assignment to sctx since scctx can be used instead, and
the name of scctx follows the naming convention used for local variables
that point to ifc_softc_ctx.
In theory there should be no functional impact with this change.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: kbowling@
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D41325
This helps align some of the code with the rest of the style used in
iflib, but as marius@ points out, this is not style(9).
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: kbowling@
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D41324
This code was used by the first incarnation of wg(4) and is dead ever
since f187d6dfbf has removed the latter
again. Moreover, this code matched iflib(4) like a square peg fits in
a round hole, was incomplete and despite some hacks still tailored to
VPC and wg(4) but not generic. In effect, this reverts the following:
09f6ff4f1a (w/ its "ancillary changes")
9aeca213241f93e931d90f9544d03e0dd691b412
Reviewed by: erj, kbowling
Differential Revision: <https://reviews.freebsd.org/D41196>
During a driver reload stress test, after 50-300 reloads a panic occurs.
After adding sleeps in between loading and unloading the driver, the
issue does not occur. It's possible that loading/unloading too fast may
cause the gt_taskqueue pointer to be freed earlier than expected;
checking for a null pointer first fixes it.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: erj@
Tested by: jeffrey.e.pieper@intel.com
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D39457
With the current implementation of if_bridge(4), bridge_enqueue()
calls gif_transmit() only. Ensure it stays that way so that the
expectations in both drivers are either met or changed accordingly.
PR: 227450
The code added in c89c8a1029 in order
to compensate possible misalignment caused by prepending the IP4/6
header with an EtherIP one got broken at some point by a rewrite of
gif(4). For better or worse, 8018ac153f
relaxed the alignment of struct ip from 32 bit to 16 bit, though. As
a result, a 16 bit offset of the IPv4 header induced by the addition
of the 16 bit EtherIP one no longer is a problem in the first place.
The alignment of struct ip6_hdr currently is even only 8 bit, making
it even less problematic with regards to possible misalignment.
Thus, remove the code for handling misalignment in in{,6}_gif_output()
altogether again.
While at it, replace the 3 bcopy(9) calls in gif(4) with memcpy(9) as
there's no need to handle overlap here.
Send an SCTP Abort message if we're refusing a connection, just like we
send a RST for TCP.
MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D40864
Basic state tracking for SCTP. This means we scan through the packet to
identify the different chunks (so we can identify state changes).
MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D40862
Panic occurs during loading driver using kldload. It exists since netlink is
enabled. There is problem with double locking ctx. This fix allows to call
ether_ifattach() without locked ctx.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
PR: 271768
Reviewed by: erj@, jhb@
MFC after: 1 day
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D40557
Actions applied to a processed packet come in case of stateless
firewalling from a rule or in case of statefull firewalling from a
state. The state obtains the actions from a rule when it is created by a
rule or by pfsync. The logic for deciding if actions come from a rule or
a state is spread across many places in pf.
There already is struct pf_rule_actions in struct pf_pdesc and thus it
can be used as a central place for storing actions and their parameters.
OpenBSD does something similar: they also store the actions in struct
pf_pdesc and have no variables in pf_test() but they use separate
variables instead of a structure. By using struct pf_rule_actions we can
simplify the code even further. Applying of actions is done *only* in
pf_rule_to_actions() no matter if for the legacy scrub rules or for the
normal match / pass rules. The logic of choosing if rule or state
actions are used is applied only once in pf_test() by copying the whole
struct.
Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D41009
The variable storing the direction of a processed packet is passed
around to many functions. Most of those functions already have a pointer
to struct pf_pdesc which also contains the direction. By using the one
in struct pf_pdesc we can reduce the amount of arguments passed around.
Reviewed by: kp
Sponsored by: InnGames GmbH
Differential Revision: https://reviews.freebsd.org/D41008
- Add new DB_DEFINE_TABLE and DB_DECLARE_TABLE macros to define new
command tables. DB_DECLARE_TABLE is intended for use in headers
similar to MALLOC_DECLARE and SYSCTL_DECL.
DB_DEFINE_TABLE takes three arguments, the name of the parent table,
the command name, and the name of the table itself, e.g.
DB_DEFINE_TABLE(show, foo, show_foo) defines a new "show foo" table.
- DB_TABLE_COMMAND, DB_TABLE_COMMAND_FLAGS, DB_TABLE_ALIAS, and
DB_ALIAS_FLAGS allow new commands and aliases to be defined. These
are similar to the existing DB_COMMAND, etc. except that they take
an initial argument giving the name of the parent table, e.g.:
DB_TABLE_COMMAND(show_foo, bar, db_show_foo_bar)
defines a new "show foo bar" command.
This provides a cleaner interface than the ad-hoc use of internal
macros like _DB_SET that was required previously (e.g. in cxgbe(4)).
This retires DB_FUNC macro as well as the internal _DB_FUNC macro.
Reviewed by: melifaro, kib, markj
Differential Revision: https://reviews.freebsd.org/D40819
If we're called on an mbuf that's passed through codel before it may
already contain the MTAG_CODEL tag. The code accounts for this and does
not allocate a new mtag. However, it inserts the mtag unconditionally.
That is, it inserts the existing mtag a second time.
When the mbuf later gets freed we iterate over the list of mtags to fee
them one by one, and we'll end up freeing an mtag that's already been
freed.
Only insert the mtag if we've allocated a new one. If we found one
there's no need to insert it again.
See also: https://redmine.pfsense.org/issues/14497
Sponsored by: Rubicon Communications, LLC ("Netgate")
The value stored in pf_mtag->tag comes from "tag" and "match tag"
keywords in pf.conf and must not be abused for storing other
information. A ruleset with enough tags could set or remove the bits
responsible for PF_TAG_SYNCOOKIE_RECREATED.
Move this syncookie status to pf_mtag->flags. Rename this and other
related constants in a way that will prevent such mistakes in the
future. Move PF_REASSEMBLED constant to mbuf.h and rename accordingly
because it's not a flag stored in pf_mtag, but an identifier of a
different m_tag. Change the value of the constant to avoid conflicts
with other m_tags using MTAG_ABI_COMPAT.
Rename the variables in pf_build_tcp() and pf_send_tcp() in to reduce
confusion.
Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D40587
Use the already populated PFE_SKIP_DST_ADDR and extend the skip
infrastructure to also skip on IP source/destination addresses.
This should make evaluating the rules slightly faster.
Reported by: R. Christian McDonald <rcm@rcm.sh>
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40567
This change exports interface capabilities using the standard
Netlink attribute type, bitset, and switches `ifconfig(8)` to use
it when displaying interface data.
Bitset comes in two representations. The first one is "compact",
where the bits are exported via two arrays - "mask" listing the
"valid" bits and "values, providing the values for those bits.
The second one is more verbose, listing each bit as a separate item,
with its name, id and value. The latter option is handy when submitting
update requests.
The support for setting capabilities will be added in the upcoming diffs.
Differential Revision: https://reviews.freebsd.org/D40331
While if_epair has no issues doing this we should drop those packets
anyway, because it improves the fidelity of the automated tests.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40397
If we route-to (or dup-to/reply-to) we re-run pf_test(), which will also
create states for the connection.
This means that we may end up matching a different (i.e. not the state
that was created by the route-to rule) state, without the attributes
(such as dummynet pipes/queues) set by the route-to rule.
Address this by inheriting the pf_rule_actions from the route-to rule
while evaluating the connection again in pf_test(). That is, we set
default pf_rule_actions based on the route-to rule for the new
evaluation. The new rule may still overrule these, but if it does not
have such actions the route-to actions are applied.
Do the same for IPv6 rules in pf_test6()/pf_route6().
See also: https://redmine.pfsense.org/issues/14039
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40340
Some context on the current IPv6 interface setup & address management:
There are two data path for IPv6 initialisation in context of assigning
LL addresses:
1) Userland explicitly requests IFF_UP for the interface w/o any addresses.
if_up() then calls in6_if_up(), which calls in6_ifattach().
The latter sets up some initial ND/IN6 state and disables IPv6 for the
interface if it’s not loopback. If the interface is loopback, then it
adds ::1/128 and LL addresses via in6_ifattach_loopback().
Then, devd notification is generated (if the VNET is the default one),
which triggers rc.network ifconfig_up(), causing ifdisabled to be removed
via SIOCSIFINFO_IN6 from ifconfig. The kernel SIOCSIFINFO_IN6 handler
calls in6_if_up() once again and it assigns the interface link-local address.
2) Userland adds IPv4 or IPv6 address to the interface. SIOCAIFADDR[_IN6]
kernel handler calls IPv4/IPv6 protocol handler to add the address.
Both then call if_ioctl() with SIOCSIFADDR. Ethernet/loopback ioctl handlers
silently sets IFF_UP for the interface. Finally, if.c:ifioctl() wrapper code
compares old and new interface flags and, if IFF_UP is added, it explicitly
calls in6_if_up(), which adds link-local address if either the original
address is IPv6 or the interface is loopback.
In the latter case, “formal” interface-up notifications are missing.
The kernel does not trigger event handler event, does not call carp hook
and does not provide any userland notification.
This diff unifies the event handling in both scenarios, providing the
necessary notifications to the kernel and userland.
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D40332
MFC after: 2 weeks
b0e38a1373 improved if_bridge's ability to cope with different VLANs,
but it failed to update bridge_transmit() to cope with the new rule that
untagged packets are treated as having VLAN ID 0 (rather than 1, as used
to be the case).
Fix that oversight.
PR: 270559
Reviewed by: kp
tap(4) devices advertise themselves as just 'ethernet autoselect',
without duplex or speed capabilities.
This advertisement makes them unable to be aggregated into lacp-based
lagg(4):
- lacp code requires underlying interfaces to be full-duplex, else
interface will not participate in lacp at all
- lacp code requires underlying interface to have non-zero speed, else
this interface can not be selected as active aggregator
PR: 217374
Reported-by: Alexandre Snarskii <snar@snar.spb.ru>
Co-authored-by: Mina Galić <freebsd@igalic.co>
Reviewed-by: imp,karles
Pull-request: https://github.com/freebsd/freebsd-src/pull/745
Make struct pfsync_state contents configurable by sending out new
versions of the structure in separate subheader actions. Both old and
new version of struct pfsync_state can be understood, so replication of
states from a system running an older kernel is possible. The version
being sent out is configured using ifconfig pfsync0 … version XXXX. The
version is an user-friendly string - 1301 stands for FreeBSD 13.1 (I
have checked synchronization against a host running 13.1), 1400 stands
for 14.0.
A host running an older kernel will just ignore the messages and count
them as "packets discarded for bad action".
Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39392
- Fixes netgdb's double ack
- Moving ack responsibility to debugnet, decision to ack made by netdump/netgdb.
- Finish responsibility moved to debugnet, new finish handler.
- netgdb now prints the address to connect to in case the user doesn't have
access to the proxy machine.
Sponsored by: Dell EMC
Reviewed By: markj, bdrewery (earlier version)
Differential Revision: https://reviews.freebsd.org/D40064
The intent is to set the value to UINT32_MAX, not to |= UINT32_MAX.
Happily the intent (ensure that we do not send further packets) is
achieved either way.
Reported by: markj
Sponsored by: Rubicon Communications, LLC ("Netgate")
if_ovpn already notified userpsace when there was a risk of sequence
number re-use, but it trusted userspace to actually rotate the key.
Convert the internal sequence number counter to 64 bits so we can detect
overflows and then refuse to send packets.
Event: BSDCan 2023
Reviewed by: Leon Dang <ldang@netgate.com>
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40187
When a new bridge_rtnode is added it is added with a NULL brt_dst. The
brt_dst is set after the entry is added. This means there's a small
window where another core could also attempt to add this node, leading
to the code attempting to log that the MAC addresses moved to a new
interface.
Aside from that being a spurious log entry it also panics, because
obif is NULL (and we attempt to dereference it).
Avoid this by settings brt_dst before we insert the bridge_rtnode.
Assert that obif is non-NULL, as an extra precaution.
Reported by: olivier@
Reviewed by: zlei@
Differential Revision: https://reviews.freebsd.org/D40147
Prepare for rtableid being included in struct pfsync_state where it will
be int32_t. Make variables which will be set to and from it the same
width.
Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D40013
Nexthop groups can be referenced by the external code. The reference
can be released after the VNET destruction. Furthermore, nexthop
groups use a single per-rib lock, which is destroyed during the
VNET desctruction. To eliminate use-after-free problem, each nhg
is marked as "unlinked" during the VNET destruction stage, leaving
nhg_idx intact. Normally there should not be such nexthops, but if
there are any, the kernel will panic on 'gr_idx != 0' when the
last nhg reference is released.
Address this by using the assert checks only when the nexthop group
is destroyed during "valid" VNET lifetime.
MFC after: 3 days