Currently af-to works only on inbound interface by creating a reversed
NAT state key which is used to match traffic returning on the outbound
interface.
Such limitation is not necessary. When an af-to state is created
for an outbound rule do not reverse the NAT state key, making it work
just like if it was created for a normal NAT rule. Depending on firewall
design it might be easier and more natural to use af-to on the outbound
interface.
Reviewed by: kp
Approved by: kp (mentor)
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D49122
To support DHCP for IPoIB links, DHCP clients and servers require the
ability to transmit link-layer broadcasts on the IB interfaces. BPF
provides the mechanism for doing this.
This change updates the if_infiniband driver to be capable of accepting
link-layer broadcast requests via BPF using Ethernet formatted frames
(the driver currently registers with BPF as DLT_EN10MB). Only Broadcast
frames can reliably be interpreted using the Ethernet header format so
detect unicast and multicast frames are rejected if passed in using the
Ethernet format. This doesn't impact the ability to support native
unicast, broadcast or multicast frames if native infiniband header
support is added to BPF at a later date.
Further the above, this commit also addresses an issue in the existing
code that can result in separation of part of the packet header from the
rest of the payload if a BPF write was attempted. This was caused by
mbuf preallocation of the infiniband header length regardless of length
of the prepend data.
Reviewed by: rpokala; Greg Foster <gfoster@vdura.com>
Tested by: Greg Foster <gfoster@vdura.com>
MFC after: 1 week
Sponsored by: Vdura
Pull Request: https://github.com/freebsd/freebsd-src/pull/1591
change log(matches) semantics slightly to make it more useful. since it
is a debug tool change of semantics not considered problematic.
up until now, log(matches) forced logging on subsequent matching rules,
the actual logging used the log settings from that matched rule.
now, log(matches) causes subsequent matches to be logged with the log settings
from the log(matches) rule. in particular (this was the driving point),
log(matches, to pflog23) allows you to have the trace log going to a seperate
pflog interface, not clobbering your regular pflogs, actually not affecting
them at all.
long conversation with bluhm about it, which didn't lead to a single bit
changed in the diff but was very very helpful. ok bluhm as well.
Obtained from: OpenBSD, henning <henning@openbsd.org>, f61b1efcce
Sponsored by: Rubicon Communications, LLC ("Netgate")
pfi_kkif_attach() annotates the kif with a flag indicating it is the "any" match.
pfi_kif_match obeys() that flag.
ok benno
Obtained from: OpenBSD, henning <henning@openbsd.org>, 4be478ce5d
Sponsored by: Rubicon Communications, LLC ("Netgate")
No change to the underlying type, so no ABI change.
We define __time_t as uint64_t if __LP64__, otherwise uint32_t,
and only define __LP64__ if long is 64 bits.
In other words: __time_t == long.
ok henning@ deraadt@
Obtained from: OpenBSD, guenther <guenther@openbsd.org>, 6c1b69a0ff
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D48963
For every state pf creates up to two source nodes: a limiting one
struct pf_kstate -> src_node and a NAT one struct pf_kstate -> nat_src_node.
The limiting source node is tracking information needed for limits using
max-src-states and max-src-nodes and the NAT source node is tracking NAT
rules only.
On closer inspection some issues emerge:
- For route-to rules the redirection decision is stored in the limiting source
node. Thus sticky-address and source limiting can't be used separately.
- Global source tracking, as promised in the man page, is totally absent from
the code. Pfctl is capable of setting flags PFRULE_SRCTRACK (enable source
tracking) and PFRULE_RULESRCTRACK (make source tracking per rule). The kernel
code checks PFRULE_SRCTRACK but ignores PFRULE_RULESRCTRACK. That makes
source tracking work per-rule only.
This patch is based on OpenBSD approach where source nodes have a type and each
state has an array of source node pointers indexed by source node type
instead of just two pointers. The conditions for limiting are applied
only to source nodes of PF_SN_LIMIT type. For global limit tracking
source nodes are attached to the default rule.
Reviewed by: kp
Approved by: kp (mentor)
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39880
Just like TCP and UDP we can fold the SCTP code into pf_test_state().
This does require a dummy variable to hold the protocol checksum, because unlike
TCP and UDP the SCTP checksum is 32-bits. We don't need to change the checksum
though, so simply pointing the pcksum pointer to a safe dummy location suffices
to re-use pf_test_state().
Sponsored by: Rubicon Communications, LLC ("Netgate")
Set it up in pf_setup_pdesc(). ok ryan benno mikeb bluhm
Obtained from: OpenBSD, henning <henning@openbsd.org>, 14255d4d87
Sponsored by: Rubicon Communications, LLC ("Netgate")
We lost the quick flag as soon as we stepped into a child anchor.
Simplify the logic, get rid of the match flag in the anchor stack, just
use the match variable we already had (and used in a boolean style) to track
the nest level we had a match at. When a child anchor had a match we also
have a match in the current anchor, so update the match level accordingly,
and thus correctly honour the quick flag.
Reported by, along with the right idea on how to fix this, by Sean Gallagher
\sean at teletech.com.au/, who also helped testing the fix. ok ryan & benno
Obtained from: OpenBSD, henning <henning@openbsd.org>, 32a028bff7
Sponsored by: Rubicon Communications, LLC ("Netgate")
Renumber 1000BASE-BX and add 100BASE-BX sequentially
I added this 1000BASE-BX in 78c63ed260 but
did not connect it to any code yet, appologize for the churn.
MFC after: 3 days
The newly introduced function bpf_ifdetach() is only available when
device bpf is enabled.
Fixes: 1ed9b381d4 ifnet: Detach BPF descriptors on interface vmove event
Commit 20c4899a8e modified pf_test_eth_rule() to not acquire the
rules read lock, so pf_commit_eth() was changed to wait until the
now-inactive rules are no longer in use before freeing them. In
particular, it uses the net_epoch to schedule callbacks once the
inactive rules are no longer visible to packet processing threads.
However, since commit 812839e5aa, pf_test_eth_rule() acquires the
rules read lock, so this deferred action is unneeded. This patch
reverts a portion of 20c4899a8e such that we avoid using deferred
callbacks to free inactive rules.
The main motivation is performance: epoch_drain_callbacks() is quite
slow, especially on busy systems, and its use in the DIOCXBEGIN handler
in particular causes long stalls in relayd when reloading configuration.
Reviewed by: kp
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D48822
In particular, we store a FIB number in both struct socket and in struct
inpcb. When updating the FIB number with setsockopt(SO_SETFIB), make
the update atomic. This is required to support the new bind_all_fibs
mode, since in that mode changing the FIB of a bound socket is not
permitted.
This requires a bit more code, but avoids a layering violation in
sosetopt(), where we hard-code the list of protocol families that
implement SO_SETFIB.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48666
remove confuzzling comment
:dlg: the xxx can go
...and this time commit to the real repo and not the one on my laptop
Obtained from: OpenBSD, henning <henning@openbsd.org>, 15e15606eb
Sponsored by: Rubicon Communications, LLC ("Netgate")
Just like we do for IPv6, generate an ICMP fragmentation needed packet if we're
going to need fragmenation for IPv4 as well (i.e. DF is set). Do so before full
processing, so we generate it with pre-NAT addreses, just as we do for IPv6.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D48805
When an interface is moving to/from a vnet jail, it may still have BPF
descriptors attached. The userland (e.g. tcpdump) does not get noticed
that the interface is departing and still opens BPF descriptors thus
may result in leaking sensitive traffic (e.g. an interface is moved
back to parent jail but a user is still sniffing traffic over it in
the child jail).
Detach BPF descriptors so that the userland will be signaled.
Reviewed by: ae
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D45727
if_detach_internal() never fail since change [1]. As a consequence,
also does its caller if_vmove(). While here, remove a stall comment.
No functional change intended.
This reverts commit c7bab2a7ca.
[1] a779388f8b if: Protect V_ifnet in vnet_if_return()
Reviewed by: glebius
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D48820
There're two possible race conditions,
1. Concurrent bpfattach() and bpf_setif(), i.e., BIOCSETIF ioctl,
2. Concurrent bpfdetach() and bpf_setif().
For the first case, userland may see BPF interface attached but it has
not been in the attached interfaces list `bpf_iflist` yet. Well it
will eventually be so this case does not matter.
For the second one, bpf_setif() may reference `dead_bpf_if` and the
kernel will panic (spotted by change [1], without the change we will
end up silently corrupted memory).
A simple fix could be that, we add additional check for `dead_bpf_if`
in the function `bpf_setif()`. But that requires to extend protection
of global lock (BPF_LOCK), i.e., BPF_LOCK should also protect the
assignment of `ifp->if_bpf`. That simple fix works but is apparently
not a good design. Since the attached interfaces list `bpf_iflist` is
the single source of truth, we look through it rather than check
against the interface's side, aka `ifp->if_bpf`.
This change has performance regression, that the cost of BPF interface
attach operation (BIOCSETIF ioctl) goes back from O(1) to O(N) (where
N is the number of BPF interfaces). Well we normally have sane amounts
of interfaces, an O(N) should be affordable.
[1] 7a974a6498 bpf: Make dead_bpf_if const
Fixes: 16d878cc99 Fix the following bpf(4) race condition ...
MFC after: 4 days
Differential Revision: https://reviews.freebsd.org/D45725
This driver does not need to retrieve those tunable during early boot.
Meanwhile SYSCTL_INT can provide rich info such as description.
Also `sysctl net.link.vxlan.[legacy_port|reuse_port]` can report the
current settings.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D48621
Using one taskqueue group with single thread to execute all admin
tasks may lead to unexpected timeouts when long running task (e.g.
handling a reset after FW update) for one interface prevents
tasks from other interfaces being executed. Taskqueue group API
doesn't let to dynamically add threads, and pre-allocating thread
for each CPU as it's done for traffic queues would be a waste
of resources on systems with small number of interfaces. Replace
global taskqueue group for admin tasks with taskqueue allocated
for each interface to allow independent execution.
Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: imp, jhb
Pull Request: https://github.com/freebsd/freebsd-src/pull/1336
There was a limit on the number of pflog interfaces - 16. remove that.
mostly by dynamically allocating pflogifs instead of making that a static
array. ok claudio zinke
Obtained from: OpenBSD, henning <henning@openbsd.org>, ab0a082ea6
Sponsored by: Rubicon Communications, LLC ("Netgate")
As suggested by henning.
Which unbreaks ie route-to after the recent pf changes.
With much help debugging and pointing out of missing bits from claudio@
ok claudio@ "looks good" henning@
Obtained from: OpenBSD, jsg <jsg@openbsd.org>, 7fa5c09028
Sponsored by: Rubicon Communications, LLC ("Netgate")
It is harmless but pointless to invoke vxlan_stop event handler when the
interface was not previously configured. This change will also prevent
an assert panic from t4_vxlan_stop_handler().
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D48494
This is part of the upcoming USB umb(4) work.
Differential Revision: https://reviews.freebsd.org/D48167
Approved by: adrian, zlei
Sponsored by: FreeBSD Foundation
PR: kern/263783
Submitted by: Pierre Pronchery <khorben@defora.org>
Teach dummymbuf to replace mbufs with larger ones.
This can be useful for testing for bugs that depend on mbuf layout.
Sponsored by: Rubicon Communications, LLC ("Netgate")
When we call pf_normalize_ip() or pf_normalize_ip6() we passed the mbuf twice.
Once as m0, and once inside the struct pf_pdesc. Remove the former to avoid
confusion when we free *m0, but don't update pd->m.
This could lead to use-after-free errors e.g. if reassembly failed.
PR: 283705
Reported by: Yichen Chai <yichen.chai@gmail.com>, Zhuo Ying Jiang Li <zyj20@cl.cam.ac.uk>
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
Allow users to choose to allow permitted SCTP connections to set up additional
multihomed connections regardless of the ruleset. That is, allow an already
established connection to set up flows that would otherwise be disallowed.
In case of if-bound connections we initially set the extra associations to
be floating, because we don't know what path they'll be taking when they're
created. Once we see the first traffic we can bind them.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D48453
- Don't use bare vnet(4) definitions in the KASSERT, they aren't available
on a kernel without VIMAGE. Just through MPASS() here. This is more of
documenting assertion rather than an assertion that may actually fire on
an unmodified kernel.
- V_netisr_enable is different to the rest of V_ prefixed globals. On a
kernel without VIMAGE it basically doesn't exist, instead of being
present as a single instance.
Fixes: a1be7978f1
This makes their argument list shorter. Also fix a bug where pf_walk_option6()
used the outer header in the pd2 case.
ok henning@ mikeb@
Obtained from: OpenBSD, bluhm <bluhm@openbsd.org>, dfff4707a1
Sponsored by: Rubicon Communications, LLC ("Netgate")
Only map mbuf when a policy is looked up and indicates that IPSEC needs
to transform the packet. If IPSEC is inline offloaded, it is up to the
interface driver to request remap if needed.
Fetch the IP header using m_copydata() instead of using mtod() to select
policy/SA.
Reviewed by: markj
Sponsored by: NVidia networking
Differential revision: https://reviews.freebsd.org/D48265
When a DCO client reconnects (e.g. on server restart) OpenVPN may create a new
socket rather than reusing the existing one. This used to be rejected because we
expect all peers to use the same socket. However, if there are no peers it's
safe to release the previous socket and install the tunnel function on the new
one.
See also: https://redmine.pfsense.org/issues/15928
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
It is not implemented, and most likely cannot be, in a robust manner.
Reviewed by: Ariel Ehrenberg <aehrenberg@nvidia.com>, slavash
Sponsored by: NVidia networking
When we reassemble IPv4 packets tag them just like we tag the IPv6 reassembled
packtes. Use this information as the basis for refragmenting the IPv6 packet.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47804
The general syntax is:
pass in inet from any to 192.168.1.1 af-to inet6 from 2001::1 to 2001::2
In the NAT64 case the "to" is not needed in af-to and the IP is extraced
from the IPv6 dst (assuming a /64 prefix).
Again most work by sperreault@, mikeb@ and reyk@
OK mcbride@, put it in deraadt@
Obtained from: OpenBSD, claudio <claudio@openbsd.org>, 0cde32ce3f
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47790
Since the IPv6 madness is not enough introduce NAT64 -- which is actually
"af-to" a generic IP version translator for pf(4).
Not everything perfect yet but lets fix these things in the tree.
Insane amount of work done by sperreault@, mikeb@ and reyk@.
Looked over by mcbride@ henning@ and myself at eurobsdcon.
OK mcbride@ and general put it in from deraadt@
Obtained from: OpenBSD, claudio <claudio@openbsd.org>, 97326e01c9
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47786
stuff nsaddr/ndaddr/nsport/ndport (addrs/ports after NAT, used a lot while
walking the ruleset and up until state is fully set up) into pf_pdesc instead
of passing around those 4 seperately all the time, also shrinks the argument
count for a few functions that have/partialy had an insane count of arguments.
kinda preparational since we'll need them elsewhere too, soon
ok ryan jsing
Obtained from: OpenBSD, henning <henning@openbsd.org>, ccf63ac6cb
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47784
We won't follow this fully, because it involves breaking syntax changes
(removing nat/rdr rules and moving this functionality into regular rules) as
well as behaviour changes because NAT is now done after the rules evaluation,
rather than before it.
We import some related changes anyway, because it paves the way for nat64
support.
This change introduces a new pf_kpool in struct pf_krule, for nat. It is not yet
used (but will be for nat64) and renames the existing 'rpool' to 'rdr'.
Obtained from: OpenBSD, henning <henning@openbsd.org>, 0ef3d4febe
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47783