Using the same random jitter for multiple rate limits allows an
attacker to use one rate limiter to figure out the current jitter
and then use this knowledge to de-randomize the other rate limiters.
This can be mitigated by using a separate randomized jitter for each
rate limiter.
This issue was reported as issue number 10 in Keyu Man et al.:
SCAD: Towards a Universal and Automated Network Side-Channel
Vulnerability Detection
Reviewed by: rrs, Peter Lei, glebius
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48804
As with net.inet.{tcp,udp}.bind_all_fibs, this causes raw sockets to
accept only packets from the same FIB.
Reviewed by: glebius
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48707
In particular, we store a FIB number in both struct socket and in struct
inpcb. When updating the FIB number with setsockopt(SO_SETFIB), make
the update atomic. This is required to support the new bind_all_fibs
mode, since in that mode changing the FIB of a bound socket is not
permitted.
This requires a bit more code, but avoids a layering violation in
sosetopt(), where we hard-code the list of protocol families that
implement SO_SETFIB.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48666
Introduce the net.inet.udp.bind_all_fibs tunable, set to 1 by default
for compatibility with current behaviour. When set to 0, all received
datagrams will be dropped unless an inpcb bound to the same FIB exists.
No functional change intended, as the new behaviour is not enabled by
default.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48664
Allow protocol layers to look up an inpcb belonging to a particular FIB.
This is indicated by setting INPLOOKUP_FIB; if it is set, the FIB to be
used is obtained from the specificed mbuf or ifnet.
No functional change intended.
Reviewed by: glebius, melifaro
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48662
Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB
number. When this flag is specified, duplicate bindings are permitted,
so long as each FIB contains at most one inpcb bound to the same
address/port. If an inpcb is bound with this flag, it'll have the
INP_BOUNDFIB flag set.
No functional change intended.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48661
This is to enable a mode where duplicate inpcb bindings are permitted,
and we want to look up an inpcb with a particular FIB. Thus, add a
"fib" parameter to in_pcblookup() and related functions, and plumb it
through.
A fib value of RT_ALL_FIBS indicates that the lookup should ignore FIB
numbers when searching. Otherwise, it should refer to a valid FIB
number, and the returned inpcb should belong to the specific FIB. For
now, just add the fib parameter where needed, as there are several
layers to plumb through.
No functional change intended.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48660
To comply with Common Criteria certification requirements, it may be
necessary to ensure that packets to 0.0.0.0/::0 are dropped and logged
by the system firewall. Currently, such packets are dropped by
ip_input() and ip6_input() before reaching pfil hooks; let's defer the
checks slightly to give firewalls a chance to drop the packets
themselves, as this gives better observability. Add some regression
tests for this with pf+pflog.
Note that prior to commit 713264f6b8, v4 packets to the unspecified
address were not dropped by the IP stack at all.
Note that ip_forward() and ip6_forward() ensure that such packets are
not forwarded; they are passed back unmodified.
Add a regression test which ensures that such packets are visible to
pflog.
Reviewed by: glebius
MFC after: 3 weeks
Sponsored by: Klara, Inc.
Sponsored by: OPNsense
Differential Revision: https://reviews.freebsd.org/D48163
Only map mbuf when a policy is looked up and indicates that IPSEC needs
to transform the packet. If IPSEC is inline offloaded, it is up to the
interface driver to request remap if needed.
Fetch the IP header using m_copydata() instead of using mtod() to select
policy/SA.
Reviewed by: markj
Sponsored by: NVidia networking
Differential revision: https://reviews.freebsd.org/D48265
but instead of tripping the assert in debug kernel, and silently falling
into UB for prod, skip IPSEC processing for KTLS framed packets when
mb_unmapped_to_ext() failed.
Reviewed by: markj
Sponsored by: NVidia networking
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D48265
See commit 4f02a7d739 for more background.
I cannot see a good reason to continue ignoring mismatching UIDs when
binding to INADDR_ANY. Looking at the sdr.V2.4a7n sources (mentioned in
bugzilla PR 7713), there is a CANT_MCAST_BIND hack wherein the
application binds to INADDR_ANY instead of a multicast address, but
CANT_MCAST_BIND isn't defined for FreeBSD builds.
It seems unlikely that we still have a use-case for allowing sockets
from different UIDs to bind to the same port when binding to the
unspecified address. And, as noted in D47832, applications like sdr
would have been broken by the inverted SO_REUSEPORT check removed in
that revision, apparently without any bug reports. Let's break
compatibility and simply disallow this case outright.
Also, add some comments, remove a hack in a regression test which tests
this funtionality, and add a new regression test to exercise the
remaining checks that were added in commit 4658dc8325.
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47870
Actually check the conditions that are enforced by the error checking
code instead of a condition which is
* checking a number to be non-negative instead of positive
* depending on a random number
Perform the checks consistently for ICMPv4 and ICMPv6.
Reviewed by: glebius, rrs, cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48001
This check for SO_REUSEPORT was added way back in commit 52b65dbe85.
Per the commit log, this commit restricted this port-stealing check to
unicast addresses, and then only if the existing socket does not have
SO_REUSEPORT set. In other words, if there exists a socket bound to
INADDR_ANY, and we bind a socket to INADDR_ANY with the same port, then
the two sockets need not be owned by the same user if the existing
socket has SO_REUSEPORT set.
This is a surprising semantic; bugzilla PR 7713 gives some additional
context. That PR makes a case for the behaviour described above when
binding to a multicast address. But, the SO_REUSEPORT check is only
applied when binding to a non-multicast address, so it doesn't really
make sense. In the PR the committer notes that "unicast applications
don't set SO_REUSEPORT", which makes some sense, but also refers to
"multicast applications that bind to INADDR_ANY", which sounds a bit
suspicious.
OpenBSD performs the multicast check, but not the SO_REUSEPORT check.
DragonflyBSD removed the SO_REUSEPORT (and INADDR_ANY) checks back in
2014 (commit 0323d5fde12a4). NetBSD explicitly copied our logic and
still has it.
The plot thickens: 20 years later, SO_REUSEPORT_LB was ported from
DragonflyBSD: this option provides similar semantics to SO_REUSEPORT,
but for unicast addresses it causes incoming connections/datagrams to be
distributed among all sockets in the group. This commit (1a43cff92a)
inverted the check for SO_REUSEPORT while adding one for
SO_REUSEPORT_LB; this appears to have been inadvertent. However:
- apparently no one has noticed that the semantics were changed;
- sockets belonging to different users can now be bound to the same port
so long as they belong to a single lbgroup bound to INADDR_ANY, which
is not correct.
Simply remove the SO_REUSEPORT(_LB) checks, as their original
justification was dubious and their current implementation is wrong; add
some tests.
Reviewed by: glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47832
For a long time, the inpcb lookup path has been lockless in the common
case: we use net_epoch to synchronize lookups. However, the routines
which update lbgroups were not careful to synchronize with unlocked
lookups. I believe that in the worst case this can result in spurious
connection aborts (I have a regression test case to exercise this), but
it's hard to be certain.
Modify in_pcblbgroup* routines to synchronize with unlocked lookup:
- When removing inpcbs from an lbgroup, do not shrink the array.
The maximum number of lbgroup entries is INPCBLBGROUP_SIZMAX (256),
and it doesn't seem worth the complexity to shrink the array when a
socket is removed.
- When resizing an lbgroup, do not insert it into the hash table until
it is fully initialized; otherwise lookups may observe a partially
constructed lbgroup.
- When adding an inpcb to the group, increment the counter after adding
the array entry, using a release store. Otherwise it's possible for
lookups to observe a null array slot.
- When looking up an entry, use a corresponding acquire load.
Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48020
A large portion of these functions just determines whether the inpcb can
bind to the address/port. This portion has no side effects, so is a
good candidate to move into its own helper function. This patch does
so, making the callers less complicated and reducing indentation.
While moving this code, also make some changes:
- Load socket options (SO_REUSEADDR etc.) only once. There is nothing
preventing another thread from toggling the socket options, so make
this function easier to reason about by avoiding races.
- When checking whether the bind address is an interface address, make a
separate sockaddr rather than temporarily modifying the one passed to
in_pcbbind().
Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47590
- Use the local var "laddr" instead of sin->sin_addr in one block.
- Use in_nullhost() instead of explicit comparisons with INADDR_ANY.
- Combine multiple socket options checks into one.
- Fix indentation.
- Remove some unhelpful comments.
This is in preparation for some simplification and bug-fixing.
No functional change intended.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47451
in_pcblookup_hash_wild_* looks up unconnected inpcbs, so there is no
point in passing the foreign address and port, and indeed those
parameters are not used. So, remove them.
No functional change intended.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47385
Found while auditing calls to M_WRITABLE to see if M_EXTPG could be
removed from its checks.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D46785
If the V_connect_ifaddr_wild sysctl says that we shouldn't infer a
destination address, return an error. Otherwise it's possible for use
of an unspecified foreign address to trigger a subsequent assertion
failure, for example in in_pcblookup_hash_locked().
Similarly, if no interface addresses are assigned, fail quickly upon an
attempt to connect to the unspecified address.
Reported by: Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after: 2 weeks
Reviewed by: zlei, allanjude, emaste
Differential Revision: https://reviews.freebsd.org/D46454
The inp_route pointer should only be provided to the network
layer, when no destination address is provided. This is only
one of the conditions, where a write lock is needed.
If, for example, the route is also cached, when the socket is
unbound, problems show up, when the sendto is called, then
connect and finally send, when the route for the addresses
provided in the sendto and connect call use different outgoing
interfaces.
While there, clearly document why the write lock is taken.
Reported by: syzbot+59122d2e848087d3355a@syzkaller.appspotmail.com
Reviewed by: Peter Lei, glebius
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46056
The nd6 code listens for RTM_DELETE events so that it can mark the
corresponding default router as inactive in the case where the default
route is deleted. A subsequent RA from the router may then reinstall
the default route.
Commit fedeb08b6a broke this for non-multipath routes, as
rib_decompose_notification() only invokes the callback for multipath
routes. Restore the old behaviour. Also ensure that we update the
router only for RTM_DELETE notifications, lost in commit 2259a03020.
Reviewed by: bz
Fixes: fedeb08b6a ("Introduce scalable route multipath.")
Fixes: 2259a03020 ("Rework part of routing code to reduce difference to D26449.")
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Bell Tower Integration
Differential Revision: https://reviews.freebsd.org/D46020
To be able to pass ifp and mtu to the ipsec_output() and ipsec
accelerator filter.
Sponsored by: NVIDIA networking
Differential revision: https://reviews.freebsd.org/D44225
Similarly, mtu is needed to decide inline IPSEC offloiad for the driver.
Sponsored by: NVIDIA networking
Differential revision: https://reviews.freebsd.org/D44224
The information about the interface is needed to coordinate inline
offloading of IPSEC processing with corresponding driver.
Sponsored by: NVIDIA networking
Differential revision: https://reviews.freebsd.org/D44223
The code deleted predates FreeBSD history. The comment deleted is 99%
outdated. Why KAME decided to use these constants instead of normal ones
also lost in centuries.
The only element of of in6_addr that is specified in RFC 3493 or
in POSIX.1-2017 is s6_addr, implemented via a #define to a union
member. However, FreeBSD and other BSD systems have additional
definitions for the other union members, s6_addr{8,16,32} which
are defined for the kernel and loader. Some Linux applications
also use them, and they seem to be allowed by the RFC and POSIX.
Remove the current ifdefs, exposing the additional fields to user
level, and replace with #if __BSD_VISIBLE. Add an explanatory
comment expanding on the previous "nonstandard" comment.
MFC after: 1 week
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D44979
in6_mapped_sockaddr() and in6_mapped_peeraddr() both define a local
variable named 'inp', but in the non-INET case, this variable is set
and never used, causing a compiler error:
/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:547:16: error:
variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
547 | struct inpcb *inp;
| ^
/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:573:16: error:
variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
573 | struct inpcb *inp;
Fix this by guarding all the INET-specific logic, including the variable
definition, behind #ifdef INET.
While here, tweak formatting in in6_mapped_peeraddr() so both functions
are the same.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1155
When debugging network issues one common clue is an unexpectedly
incrementing error counter. This is helpful, in that it gives us an
idea of what might be going wrong, but often these counters may be
incremented in different functions.
Add a static probe point for them so that we can use dtrace to get
futher information (e.g. a stack trace).
For example:
dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }'
This can be disabled by setting the following kernel option:
options KDTRACE_NO_MIB_SDT
Reviewed by: gallatin, tuexen (previous version), gnn (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43504
Use counter_ratecheck() instead of racy and slow ppsratecheck. Use a
separate counter for every currently known type of ICMPv6. Provide logging
of ratelimit events. Provide jitter to counter open UDP port detection.
Reviewed by: tuexen, zlei
Differential Revision: https://reviews.freebsd.org/D44482
Most of them can be declared as static after the move out of in6_proto.c.
Keeping sysctl(9) declarations with their text descriptions next to the
variable declaration create self-documenting code. There should be no
functional changes.
Differential Revision: https://reviews.freebsd.org/D44481
The generation of ICMP6_ECHO_REPLY bypasses icmp6_error(), thus rate
limit was not applied.
Reviewed by: tuexen, zlei
Differential Revision: https://reviews.freebsd.org/D44480
When profiling an IP6 heavy workload, I noticed that we were
getting a lot of cache misses in ip6_output() around
ip6_pktopts. This was happening because the TCP stack passes
inp->in6p_outputopts even if all options are unused. So in the
common case of no options present, pkt_opts is not null, and is
checked repeatedly for different options. Since ip6_pktopts is
large (4 cachelines), and every field is checked, we take 4
cache misses (2 of which tend to be hidden by the adjacent line
prefetcher).
To fix this common case, I introduced a new flag in ip6_pktopts
(ip6po_valid) which tracks which options have been set. In the
common case where nothing is set, this causes just a single
cache miss to load. It also eliminates a test for some options
(if (opt != NULL && opt->val >= const) vs if ((optvalid & flag) !=0 )
To keep the struct the same size in 64-bit kernels, and to keep
the integer values (like ip6po_hlim, ip6po_tclass, etc) on the
same cacheline, I moved them to the top.
As suggested by zlei, the null check in MAKE_EXTHDR() becomes
redundant, and can be removed.
For our web server workload (with the ip6po_tclass option set),
this drops the CPI from 2.9 to 2.4 for ip6_output
Differential Revision: https://reviews.freebsd.org/D44204
Reviewed by: bz, glebius, zlei
No Objection from: melifaro
Sponsored by: Netflix Inc.
Don't report a BACKUP CARP address as local. These two functions are used
only by source address validation for input packets, controlled by sysctls
net.inet.ip.source_address_validation and
net.inet6.ip6.source_address_validation. For this purpose we definitely
want to treat BACKUP addresses as non local.
This change is conservative and doesn't modify compat in_localip() and
in6_localip(). They are used more widely than the FIB-aware versions.
The change would modify the notion of ipfw(4) 'me' keyword. There might
be other consequences as in_localip() is used by various tunneling
protocols.
PR: 277349