Some POSIX threads implementations (e.g. FreeBSD's libthr) allocate
memory on the heap when pthread_mutex_init() is called. Every call to
that function must be accompanied by a corresponding call to
pthread_mutex_destroy() or else the memory allocated for the mutex will
leak.
jemalloc can be used for detecting memory allocations which are not
released by a process when it exits. Unfortunately, since jemalloc is
also the system allocator on FreeBSD and a special (profiling-enabled)
build of jemalloc is required for memory leak detection, this method
cannot be used for detecting leaked memory allocated by libthr on a
stock FreeBSD installation.
However, libthr's behavior can be emulated on any platform by
implementing alternative versions of libisc functions for creating and
destroying mutexes that allocate memory using malloc() and release it
using free(). This enables using jemalloc for detecting missing
pthread_mutex_destroy() calls on any platform on which it works
reliably.
Introduce a new ISC_TRACK_PTHREADS_OBJECTS preprocessor macro, which
causes isc_mutex_t structures to be allocated on the heap by
isc_mutex_init() and freed by isc_mutex_destroy(). Reuse existing mutex
macros (after renaming them appropriately) for other operations.
Instead of returning error values from isc_rwlock_*(), isc_mutex_*(),
and isc_condition_*() macros/functions and subsequently carrying out
runtime assertion checks on the return values in the calling code,
trigger assertion failures directly in those macros/functions whenever
any pthread function returns an error, as there is no point in
continuing execution in such a case anyway.
Instead of using isc_once_do() on every isc_mutex_init() call, use the
global library constructor to initialize the default mutex attr
object (optionally with PTHREAD_MUTEX_ADAPTIVE_NP if supported) just
once when the library is loaded.
isc_rwlock_init() currently detects pthread_rwlock_init() failures using
a REQUIRE() assertion. Use the ERRNO_CHECK() macro for that purpose
instead, so that read-write lock initialization failures are handled
identically as condition variable (pthread_cond_init()) and mutex
(pthread_mutex_init()) initialization failures.
In a number of situations in pthreads-related code, a common sequence of
steps is taken: if the value returned by a library function is not 0,
pass errno to strerror_r(), log the string returned by the latter, and
immediately abort execution. Add an ERRNO_CHECK() preprocessor macro
which takes those exact steps and use it wherever (conveniently)
possible.
Notes:
1. The "log the return value of strerror_r() and abort" pattern is used
in a number of other places that this commit does not touch; only
"!= 0" checks followed by isc_error_fatal() calls with
non-customized error messages are replaced here.
2. This change temporarily breaks file name & line number reporting for
isc__mutex_init() errors, to prevent breaking the build. This issue
will be rectified in a subsequent change.
Commit 7b2ea97e46 introduced a logic bug
in resume_dslookup(): that function now only conditionally checks
whether DS chasing can still make progress. Specifically, that check is
only performed when the previous resume_dslookup() call invokes
dns_resolver_createfetch() with the 'nameservers' argument set to
something else than NULL, which may not always be the case. Failing to
perform that check may trigger assertion failures as a result of
dns_resolver_createfetch() attempting to resolve an invalid name.
Example scenario that leads to such outcome:
1. A validating resolver is configured to forward all queries to
another resolver. The latter returns broken DS responses that
trigger DS chasing.
2. rctx_chaseds() calls dns_resolver_createfetch() with the
'nameservers' argument set to NULL.
3. The fetch fails, so resume_dslookup() is called. Due to
fevent->result being set to e.g. DNS_R_SERVFAIL, the default branch
is taken in the switch statement.
4. Since 'nameservers' was set to NULL for the fetch which caused the
resume_dslookup() callback to be invoked
(fctx->nsfetch->private->nameservers), resume_dslookup() chops off
one label off fctx->nsname and calls dns_resolver_createfetch()
again, for a name containing one label less than before.
5. Steps 3-4 are repeated (i.e. all attempts to find the name servers
authoritative for the DS RRset being chased fail) until fctx->nsname
becomes stripped down the the root name.
6. Since resume_dslookup() does not check whether DS chasing can still
make progress, it strips off a label off the root name and continues
its attempts at finding the name servers authoritative for the DS
RRset being chased, passing an invalid name to
dns_resolver_createfetch().
Fix by ensuring resume_dslookup() always checks whether DS chasing can
still make progress when a name server fetch fails. Update code
comments to ensure the purpose of the relevant dns_name_equal() check is
clear.
There should be 2 keys with the same key id after the numerically
lower one is revoked (serial space arithmetic). The DS points
at the non-revoked key so validation should still succeed.
messages indicating the reason for a fallback to AXFR (i.e, because
the requested serial number is not present in the journal, or because
the size of the IXFR response would exceeed "max-ixfr-ratio") are now
logged at level info instead of debug(4).
Before this change the TLS code would ignore the accept callback result,
and would not try to gracefully close the connection. This had not been
noticed, as it is not really required for DoH. Now the code tries to
shut down the TLS connection gracefully when accepting it is not
successful.
This commit removes an assertion from the unit test which cannot be
guaranteed.
According to the test, exactly one client send must succeed. However,
it cannot really be guaranteed, as do not start to read data in the
accept callback on the server nor attach to the accepted handle. Thus,
we can expect the connection to be closed soon after we have returned
from the callback.
Interestingly enough, the test would pass just fine on TCP because:
a) there are fewer layers involved and thus there is less processing;
b) it is possible for the data to be sent and end up in an internal OS
socket buffer without being touched by an application's code on the
server. In such a case the client's write callback still would be
called successfully;
There is a chance for the test to succeed over TLS as well (as it
happily did before), but as the code has been changed to close unused
connections as soon as possible, the chance is far slimmer now.
What can be guaranteed is:
* cconnects == 1 (number client connections equals 1);
* saccepts == 1 (number of accepted connections equals 1).
Otherwise the code path will lead to a call to SSL_get_error()
returning SSL_ERROR_SSL, which in turn might lead to closing
connection to early in an unexpected way, as it is clearly not what is
intended.
The issue was found when working on loppmgr branch and appears to
be timing related as well. Might be responsible for some unexpected
transmission failures e.g. on zone transfers.
In some operations - most prominently when establishing connection -
it might be beneficial to bail out earlier when the network manager
is stopping.
The issue is backported from loopmgr branch, where such a change is
not only beneficial, but required.
In some cases - in particular, in case of errors, NULL might be passed
to a connection callback instead of a handle that could have led to
an abort. This commit ensures that such a situation will not occur.
The issue was found when working on the loopmgr branch.
This commit ensures that the underlying TCP socket of a TLS connection
gets closed earlier whenever there are no pending operations on it.
In the loop-manager branch, in some circumstances the connection
could have remained opened for far too long for no reason. This
commit ensures that will not happen.
When dnssec-policy is used, and the zone is not dynamic, BIND will
assume that the zone is inline-signed. But the function responsible
for this did not inherit the dnssec-policy option from the view or
options level, and thus never enabled inline-signing, while the zone
should have been.
This is fixed by this commit.
Fix a comment, ensuring the right parameters are used (zone is
parameter $3, not $2) and add view and policy parameters to the comment.
Fix the view tests and test the correct view (example3 instead of
example2).
Fix placement of "n=$((n+1)" for two test cases.