Commit graph

235 commits

Author SHA1 Message Date
Gleb Smirnoff
74ed2e8ab2 raw ip: fix regression with multicast and RSVP
With 61f7427f02 raw sockets protosw has wildcard pr_protocol.  Protocol
of a specific pcb is stored in inp_ip_p.

Reviewed by:		karels
Reported by:		karels
Differential revision:	https://reviews.freebsd.org/D36429
Fixes:			61f7427f02
2022-09-02 12:17:09 -07:00
Mike Karels
6ca0ca7b4c IPv4 multicast: fix LOR in shutdown path
X_ip_mrouter_done() was calling the interface ioctl routines via
if_allmulti() while holding a write lock.  However, some interface
ioctl routines, including em/iflib and tap, use sxlocks, which are
not permitted while holding a non-sleepable lock, and this elicits
a warning from WITNESS.  Fix the locking issue by recording the
affected interface pointers in a malloc'ed array, and call
if_allmulti() on each after dropping the rwlock.

Reviewed by:	bz
Differential Revision: https://reviews.freebsd.org/D34845
2022-04-11 14:51:16 -05:00
Mike Karels
04cd74b4cd IPv4 multicast: fix netstat -g
The vif structure includes fields at the end which are #ifdef KERNEL,
causing a mismatch between the structure sizes between kernel and
user level.  netstat -g failed with an ENOMEM on the sysctl to fetch
the vif table.  Change the vif sysctl code in ip_mroute to copy out
only the user-level-visible portion of each table entry.

Reviewed by:	bz, wma
Differential Revision: https://reviews.freebsd.org/D34627
2022-03-22 07:38:01 -05:00
Mike Karels
2cf1e120c6 Enter epoch when addding IPv4 multicast forwarding cache entry
The code path from the IPv4 multicast setsockopt could call ip_output()
without entering an epoch.  Specifically, the MRT_ADD_MFC setbsocopt
would call add_mfc(), which in turn called ip_mdq() to send queued
packets.  This resulted in an epoch assert failure in ip_output().
Enter an epoch in add_mfc(), and add some epoch asserts to check
for similar failures.

Reviewed by:	kp, bz, wma, cy
Differential Revision: https://reviews.freebsd.org/D34624
2022-03-22 07:28:57 -05:00
Sylvian Meygret
cd7306bb1f ip_mroute: split mrouter interface deactivation and if_free
Move if_free outside MRW_LOCK. This will silence LOR message
which might appere during deinitialization.
2022-02-04 10:25:07 +01:00
Wojciech Macek
77223d98b6 ip_mroute: refactor epoch-basd locking
Remove duplicated epoch_enter and epoch_exit in IP inp/outp routines.
Remove unnecessary macros as well.

Obtained from:		Semihalf
Spponsored by:		Stormshield
Reviewed by:		glebius
Differential revision:	https://reviews.freebsd.org/D34030
2022-02-02 06:48:05 +01:00
Wojciech Macek
0daa28057c ip_mroute: add unlock in early-exit
Add missing unlock if V_ip_mrotue is not set

Obtained from:		Semihalf
2022-01-22 14:48:47 +01:00
Wojciech Macek
9ce46cbc95 ip_mroute: move ip_mrouter_done outside lock
X_ip_mrouter_done might sleep, which triggers INVARIANTS to
print additional errors on the screen.
Move it outside the lock, but provide some basic synchronization
to avoid race condition during module uninit/unload.

Obtained from:		Semihalf
Sponsored by:		Stormshield
2022-01-21 06:17:19 +01:00
Wojciech Macek
776c34f646 ip_mroute: remove unused variables
Sponsored by:	Stormshield
Obtained from:	Semihalf
2022-01-11 13:06:22 +01:00
Wojciech Macek
68f28dd1cc ip_mroute: do not sleep when lock is taken
Kthread initialization calls uma_alloc which can sleep.
Modify the code to use deferred work instead.
2022-01-11 11:19:32 +01:00
Wojciech Macek
8a727c3df8 mroute: add missing WUNLOCK
Add missing WNLOCK as in all other error cases.

Reported by:		Stormshield
Obtained from:		Semihalf
2021-10-28 07:12:23 +02:00
Wojciech Macek
fb3854845f mroute: fix memory leak
Add MFC to linked list to store incoming packets
before MCAST JOIN was captured.

Sponsored by:		Stormshield
Obtained from:		Semihalf
MFC after:		2 weeks
2021-10-28 07:12:16 +02:00
Wojciech Macek
f61cb12aac mroute: fix locking issues
In some cases the code may fall into deadlock.
Avoid calling epoch_wait when W-lock is taken.

Sponsored by:	Stormshield
Obtained from:	Semihalf
2021-08-13 11:06:17 +02:00
Roy Marples
7045b1603b socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by:	philip (network), kbowling (transport), gbe (manpages)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26652
2021-07-28 09:35:09 -07:00
Wojciech Macek
17ac6d94db ip_mroute: initialize vif ifnet properly
Use if_alloc to ensure all fields of ifnet are allocated
properly

Reported by:   Damien Deville
Sponsored by:  Stormshield
Obtained from: Semihalf
Reviewed by:   mw
Differential revision: https://reviews.freebsd.org/D30608
2021-06-23 10:13:52 +02:00
Wojciech Macek
d40cd26a86 ip_mroute: rework ip_mroute
Approved by:     mw
Obtained from:   Semihalf
Sponsored by:    Stormshield
Differential Revision: https://reviews.freebsd.org/D30354

Changes:
1. add spinlock to bw_meter

If two contexts read and modify bw_meter values
it might happen that these are corrupted.
Guard only code fragments which do read-and-modify.
Context which only do "reads" are not done inside
spinlock block. The only sideffect that can happen is
an 1-p;acket outdated value reported back to userspace.

2. replace all locks with a single RWLOCK

Multiple locks caused a performance issue in routing
hot path, when two of them had to be taken. All locks
were replaced with single RWLOCK which makes the hot
path able to take only shared access to lock most of
the times.
All configuration routines have to take exclusive lock
(as it was done before) but these operation are very rare
compared to packet routing.

3. redesign MFC expire and UPCALL expire

Use generic kthread and cv_wait/cv_signal for deferring
work. Previously, upcalls could be sent from two contexts
which complicated the design. All upcall sending is now
done in a kthread which allows hot path to work more
efficient in some rare cases.

4. replace mutex-guarded linked list with lock free buf_ring

All message and data is now passed over lockless buf_ring.
This allowed to remove some heavy locking when linked
lists were used.
2021-05-31 05:48:15 +02:00
Wojciech Macek
eedbbec3fd ip_mroute: remove unused declarations
fix build for non-x86 targets
2021-05-21 08:01:26 +02:00
Wojciech Macek
741afc6233 ip_mroute: refactor bw_meter API
API should work as following:
- periodicaly report Lower-or-EQual bandwidth (LEQ) connections
  over kernel socket, if user application registered for such
  per-flow notifications
- report Grater-or-EQual (GEQ) bandwidth as soon as it reaches
  specified value in configured time window

Custom implementation of callouts was removed. There is no
point of doing calout-wheel here as generic callouts are
doing exactly the same. The performance is not critical
for such reporting, so the biggest concern should be
to have a code which can be easily maintained.

This is ia preparation for locking rework which is highly inefficient.

Approved by:    mw
Sponsored by:   Stormshield
Obtained from:  Semihalf
Differential Revision:  https://reviews.freebsd.org/D30210
2021-05-21 06:43:41 +02:00
Wojciech Macek
787845c0e8 Revert "ip_mroute: refactor bw_meter API"
This reverts commit d1cd99b147.
2021-05-20 12:14:58 +02:00
Wojciech Macek
d1cd99b147 ip_mroute: refactor bw_meter API
API should work as following:
- periodicaly report Lower-or-EQual bandwidth (LEQ) connections
  over kernel socket, if user application registered for such
  per-flow notifications
- report Grater-or-EQual (GEQ) bandwidth as soon as it reaches
  specified value in configured time window

Custom implementation of callouts was removed. There is no
point of doing calout-wheel here as generic callouts are
doing exactly the same. The performance is not critical
for such reporting, so the biggest concern should be
to have a code which can be easily maintained.

This is ia preparation for locking rework which is highly inefficient.

Approved by:    mw
Sponsored by:   Stormshield
Obtained from:  Semihalf
Differential Revision:  https://reviews.freebsd.org/D30210
2021-05-20 10:13:55 +02:00
Wojciech Macek
0b103f7237 mrouter: do not loopback packets unconditionally
Looping back router multicast traffic signifficantly
stresses network stack. Add possibility to disable or enable
loopbacked based on sysctl value.

Reported by:    Daniel Deville
Reviewed by:	mw
Differential Revision:	https://reviews.freebsd.org/D29947
2021-05-11 12:36:07 +02:00
Wojciech Macek
65634ae748 mroute: fix race condition during mrouter shutting down
There is a race condition between V_ip_mrouter de-init
    and ip_mforward handling. It might happen that mrouted
    is cleaned up after V_ip_mrouter check and before
    processing packet in ip_mforward.
    Use epoch call aproach, similar to IPSec which also handles
    such case.

Reported by:    Damien Deville
Obtained from:	Stormshield
Reviewed by:	mw
Differential Revision:	https://reviews.freebsd.org/D29946
2021-05-11 12:34:20 +02:00
Alexander V. Chernikov
924d1c9a05 Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors."
Wrong version of the change was pushed inadvertenly.

This reverts commit 4a01b854ca.
2021-02-08 22:32:32 +00:00
Alexander V. Chernikov
4a01b854ca SO_RERROR indicates that receive buffer overflows should be handled as errors.
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.

This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.
2021-02-08 21:42:20 +00:00
Bjoern A. Zeeb
506512b170 ip_mroute: fix the viftable export sysctl
It seems that in r354857 I got more than one thing wrong.
Convert the SYSCTL_OPAQUE to a SYSCTL_PROC to properly export the these
days allocated and not longer static per-vnet viftable array.
This fixes a problem with netstat -g which would show bogus information
for the IPv4 Virtual Interface Table.

PR:		246626
Reported by:	Ozkan KIRIK (ozkan.kirik gmail.com)
MFC after:	3 days
2020-10-11 00:01:00 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Bjoern A. Zeeb
e387af1fa8 Rather than zeroing MAXVIFS times size of pointer [r362289] (still better than
sizeof pointer before [r354857]), we need to zero MAXVIFS times the size of
the struct.  All good things come in threes; I hope this is it on this one.

PR:		246629, 206583
Reported by:	kib
MFC after:	ASAP
2020-06-21 22:09:30 +00:00
Bjoern A. Zeeb
ce19cceb8d When converting the static arrays to mallocarray() in r356621 I missed
one place where we now need to multiply the size of the struct with the
number of entries.  This lead to problems when restarting user space
daemons, as the cleanup was never properly done, resulting in MRT_ADD_VIF
EADDRINUSE.
Properly zero all array elements to avoid this problem.

PR:		246629, 206583
Reported by:	(many)
MFC after:	4 days
Sponsored by:	Rubicon Communications, LLC (d/b/a "Netgate")
2020-06-17 21:04:38 +00:00
Bjoern A. Zeeb
b7b3d237e7 The call into ifa_ifwithaddr() needs to be epoch protected; ortherwise
we'll panic on an assertion.
While here, leave a comment that the ifp was never protected and stable
(as glebius pointed out) and this needs to be fixed properly.

Discovered while working on:	PR 246629
Reviewed by:	glebius
MFC after:	4 days
Sponsored by:	Rubicon Communications, LLC (d/b/a "Netgate")
2020-06-17 20:58:37 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Randall Stewart
481be5de9d White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by:	Netflix Inc.
2020-02-12 13:31:36 +00:00
Bjoern A. Zeeb
1a117215c7 Reduce the vnet_set module size of ip_mroute to allow loading as a module.
With VIMAGE kernels modules get special treatment as they need
to also keep the original values and make copies for each instance.
For that a few pages of vnet modspace are provided and the
kernel-linker and the VNET framework know how to deal with things.
When the modspace is (almost) full, other modules which would
overflow the modspace cannot be loaded and kldload will fail.

ip_mroute uses a lot of variable space, mostly be four big arrays:
set_vnet 0000000000000510 vnet_entry_multicast_register_if
set_vnet 0000000000000700 vnet_entry_viftable
set_vnet 0000000000002000 vnet_entry_bw_meter_timers
set_vnet 0000000000002800 vnet_entry_bw_upcalls

Dynamically malloc the three big ones for each instance we need
and free them again on vnet teardown (the 4th is an ifnet).
That way they only need module space for a single pointer and
allow a lot more modules using virtualized variables to be loaded
on a VNET kernel.

PR:		206583
Reviewed by:	hselasky, kp
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D22443
2019-11-19 15:38:55 +00:00
Gleb Smirnoff
b8a6e03fac Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by:	gallatin, hselasky, cy, adrian, kristof
Differential Revision:	https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
Hans Petter Selasky
59854ecf55 Convert all IPv4 and IPv6 multicast memberships into using a STAILQ
instead of a linear array.

The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.

To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi

All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.

This patch has been tested by pho@ .

Differential Revision: https://reviews.freebsd.org/D20080
Reviewed by:	markj @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:54:41 +00:00
Gleb Smirnoff
a68cc38879 Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such
  macros a quite unsafe, e.g. will produce a buggy code if same macro is
  used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
  IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
  locking macros to what they actually are - the net_epoch.
  Keeping them as is is very misleading. They all are named FOO_RLOCK(),
  while they no longer have lock semantics. Now they allow recursion and
  what's more important they now no longer guarantee protection against
  their companion WLOCK macros.
  Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with:	jtl, gallatin
2019-01-09 01:11:19 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Andrey V. Elsukov
6d8fdfa9d5 Rework IP encapsulation handling code.
Currently it has several disadvantages:
- it uses single mutex to protect internal structures. It is used by
  data- and control- path, thus there are no parallelism at all.
- it uses single list to keep encap handlers for both INET and INET6
  families.
- struct encaptab keeps unneeded information (src, dst, masks, protosw),
  that isn't used by code in the source tree.
- matches are prioritized and when many tunneling interfaces are
  registered, encapcheck handler of each interface is invoked for each
  packet. The search takes O(n) for n interfaces. All this work is done
  with exclusive lock held.

What this patch includes:
- the datapath is converted to be lockless using epoch(9) KPI.
- struct encaptab now linked using CK_LIST.
- all unused fields removed from struct encaptab. Several new fields
  addedr: min_length is the minimum packet length, that encapsulation
  handler expects to see; exact_match is maximum number of bits, that
  can return an encapsulation handler, when it wants to consume a packet.
- IPv6 and IPv4 handlers are stored in separate lists;
- added new "encap_lookup_t" method, that will be used later. It is
  targeted to speedup lookup of needed interface, when gif(4)/gre(4) have
  many interfaces.
- the need to use protosw structure is eliminated. The only pr_input
  method was used from this structure, so I don't see the need to keep
  using it.
- encap_input_t method changed to avoid using mbuf tags to store softc
  pointer. Now it is passed directly trough encap_input_t method.
  encap_getarg() funtions is removed.
- all sockaddr structures and code that uses them removed. We don't have
  any code in the tree that uses them. All consumers use encap_attach_func()
  method, that relies on invoking of encapcheck() to determine the needed
  handler.
- introduced struct encap_config, it contains parameters of encap handler
  that is going to be registered by encap_attach() function.
- encap handlers are stored in lists ordered by exact_match value, thus
  handlers that need more bits to match will be checked first, and if
  encapcheck method returns exact_match value, the search will be stopped.
- all current consumers changed to use new KPI.

Reviewed by:	mmacy
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15617
2018-06-05 20:51:01 +00:00
Matt Macy
4f6c66cc9c UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
Matt Macy
f6960e207e netinet silence warnings 2018-05-19 05:56:21 +00:00
Conrad Meyer
222daa421f style: Remove remaining deprecated MALLOC/FREE macros
Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions).  Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by:	jhb
Reviewed by:	cy, imp, markj, rmacklem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14035
2018-01-25 22:25:13 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Eric van Gyzen
40769242ed Add some ntohl() love to r315277
inet_ntoa() and inet_ntoa_r() take the address in network
byte-order.  When I removed those calls, I should have
replaced them with ntohl() to make the hex addresses slightly
less unreadable.  Here they are.

See r315277 regarding classic blunders.

vangyzen: you're deep in "no good deed" territory, it seems
    --badger

Reported by:	ian
MFC after:	3 days
MFC when:	I finally get it right
Sponsored by:	Dell EMC
2017-03-14 20:57:54 +00:00
Eric van Gyzen
47d803ea71 KTR: log IPv4 addresses in hex rather than dotted-quad
When I made the changes in r313821, I fell victim to one of the
classic blunders, the most famous of which is: never get involved
in a land war in Asia.  But only slightly less well known is this:
Keep your brain turned on and engaged when making a tedious, sweeping,
mechanical change.  KTR can correctly log the immediate integral values
passed to it, as well as constant strings, but not non-constant strings,
since they might change by the time ktrdump retrieves them.

Reported by:	glebius
MFC after:	3 days
Sponsored by:	Dell EMC
2017-03-14 18:27:48 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Eric van Gyzen
8144690af4 Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.

Suggested by:	glebius, emaste
Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9625
2017-02-16 20:47:41 +00:00
Kevin Lo
c3bef61e58 Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D7878
2016-09-15 07:41:48 +00:00
Bjoern A. Zeeb
89856f7e2d Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
Pedro F. Giffuni
99d628d577 netinet: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.

Reviewed by:	ae. tuexen
2016-04-15 15:46:41 +00:00
Alexander V. Chernikov
10e0e23528 Remove now-unused wrappers for various routing functions. 2016-01-14 08:54:44 +00:00
Alexander V. Chernikov
ea8d14925c Remove sys/eventhandler.h from net/route.h
Reviewed by:	ae
2016-01-09 09:34:39 +00:00