netmap: fix lock order reversal related to kqueue usage
When using poll(), select() or kevent() on netmap file descriptors,
netmap executes the equivalent of NIOCTXSYNC and NIOCRXSYNC commands,
before collecting the events that are ready. In other words, the
poll/kevent callback has side effects. This is done to avoid the
overhead of two system call per iteration (e.g., poll() + ioctl(NIOC*XSYNC)).
When the kqueue subsystem invokes the kqueue(9) f_event callback
(netmap_knrw), it holds the lock of the struct knlist object associated
to the netmap port (the lock is provided at initialization, by calling
knlist_init_mtx).
However, netmap_knrw() may need to wake up another netmap port (or even
the same one), which means that it may need to call knote().
Since knote() needs the lock of the struct knlist object associated to
the to-be-wake-up netmap port, it is possible to have a lock order reversal
problem (AB/BA deadlock).
This change prevents the deadlock by executing the knote() call in a
per-selinfo taskqueue, where it is possible to hold a mutex.
The change also adds a counter (kqueue_users) to keep track of how many
kqueue users are referencing a given struct nm_selinfo.
In this way, nm_os_selwakeup() can schedule the kevent notification
task only when kqueue is actually being used.
This is important to avoid wasting CPU in the common case where
kqueue is not used.
Reviewed by: aleksandr.fedorov_itglobal.com
Differential Revision: https://reviews.freebsd.org/D18956
netmap: refactor logging macros and pipes
Changelist:
- Replace ND, D and RD macros with nm_prdis, nm_prinf, nm_prerr
and nm_prlim, to avoid possible naming conflicts.
- Add netmap_krings_mode_commit() helper function and use that
to reduce code duplication.
- Refactor pipes control code to export some functions that
can be reused by the veth driver (on Linux) and epair(4).
- Add check to reject API requests with version less than 11.
- Small code refactoring for the null adapter.
Conflicts:
sys/dev/netmap/netmap.c
netmap: upgrade sync-kloop support
Add SYNC_KLOOP_MODE option, and add support for direct mode, where application
executes the TXSYNC and RXSYNC in the context of the ioeventfd wake up callback.
netmap: fix knote() argument to match the mutex state
The nm_os_selwakeup function needs to call knote() to wake up kqueue(9)
users. However, this function can be called from different code paths,
with different lock requirements.
This patch fixes the knote() call argument to match the relavant lock state.
Also, comments have been updated to reflect current code.
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219846
Reported by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D18876
netmap: fix bug in netmap_poll() optimization
The bug was introduced by r339639, although it is present in the upstream
netmap code since 2015. It is due to resetting the want_rx variable to
POLLIN, rather than resetting it to POLLIN|POLLRDNORM.
It only affects select(), which uses POLLRDNORM. poll() is not affected,
because it uses POLLIN.
Also, it only affects FreeBSD, because Linux skips the optimization
implemented by the piece of code where the bug occurs.
To check if txsync can be skipped, it is necessary to look for
unseen TX space. However, this means comparing ring->cur
against ring->tail, rather than ring->head against ring->tail
(like nm_ring_empty() does).
Sponsored by: Sunny Valley Networks
netmap: move buf_size validation code to its own function
This code validates the netmap buf_size against the interface MTU
and maximum descriptor size, to make sure the values are consistent.
Moving this functionality to its own function is needed because this
function is also called by Linux-specific code.
netmap: netmap_transmit should honor bpf packet tap hook
This allows tcpdump to capture outbound kernel packets while
in netmap mode
Submitted by: Marc de la Gueronniere <mdelagueronniere@verisign.com>
Reviewed by: vmaffione
MFC after: 1 week
Sponsored by: Verisign, Inc.
Differential Revision: https://reviews.freebsd.org/D17896
netmap: align codebase to the current upstream (760279cfb2730a585)
Changelist:
- Replace netmap passthrough host support with a more general
mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop.
No kernel threads are used to use this feature: the application
is required to spawn a thread (or a process) and issue a
SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The
kernel loop is executed by the ioctl implementation, which returns
to userspace only when a different thread calls SYNC_KLOOP_STOP
or the netmap file descriptor is closed.
- Update the if_ptnet driver to cope with the new data structures,
and prune all the obsolete ptnetmap code.
- Add support for "null" netmap ports, useful to allocate netmap_if,
netmap_ring and netmap buffers to be used by specialized applications
(e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect.
- Various fixes and code refactoring.
Sponsored by: Sunny Valley Networks
Differential Revision: https://reviews.freebsd.org/D18015
netmap: align codebase to the current upstream (sha 8374e1a7e6941)
Changelist:
- Move large parts of VALE code to a new file and header netmap_bdg.[ch].
This is useful to reuse the code within upcoming projects.
- Improvements and bug fixes to pipes and monitors.
- Introduce nm_os_onattach(), nm_os_onenter() and nm_os_onexit() to
handle differences between FreeBSD and Linux.
- Introduce some new helper functions to handle more host rings and fake
rings (netmap_all_rings(), netmap_real_rings(), ...)
- Added new sysctl to enable/disable hw checksum in emulated netmap mode.
- nm_inject: add support for NS_MOREFRAG
Approved by: gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D17364
o Restore netmap emulation mode to working order, including
fixing the destructor panics on detach.
o Omit pipe additions to these fixes, likely problematic for
Suricata to pass traffic like it does on 11.0 without this
patch.
o Allow to build the module without errors in the tree.
Many thanks to Vincenzo Maffione for assistance and review! :)
From b497fe34fd275da6b850bf271f510d02b888b8bc Mon Sep 17 00:00:00 2001
From: Giuseppe Lettieri <g.lettieri@iet.unipi.it>
Date: Thu, 2 Jun 2016 00:21:40 +0200
Subject: [PATCH] allocate only the rings requested by the user
From 09936864fa5b67b82ef4a9907819b7018e9a38f2 Mon Sep 17 00:00:00 2001
From: Giuseppe Lettieri <g.lettieri@iet.unipi.it>
Date: Wed, 20 Jul 2016 20:35:12 +0000
Subject: [PATCH] freebsd: fix const-related warning
From ab90c6c10224fefbb6a6c6e0b92e6ba80e5b694d Mon Sep 17 00:00:00 2001
From: Vincenzo Maffione <v.maffione@gmail.com>
Date: Wed, 28 Sep 2016 18:39:55 +0200
Subject: [PATCH] freebsd: generic: change mbuf allocation management
From fe811e11b2c37fc274a1134e1c10b2f6ada1a91c Mon Sep 17 00:00:00 2001
From: Vincenzo Maffione <v.maffione@gmail.com>
Date: Thu, 29 Sep 2016 08:54:52 +0200
Subject: [PATCH] freebsd: generic: call m_extadd() only once for each mbuf