If the dns_request send callback is delayed, the dst API would get
deinitialized and then the detach from the tsig key would cause an
assertion failure.
Shutdown the isc_managers early, and only then dereference the dst
objects when cleaning up the resources used by nsupdate.
(cherry picked from commit be34b1c535)
The order in which the netievents are processed on the network manager
loop is not guaranteed. Therefore the recv/read callback can come
earlier than the send/write callback.
The dns_request API wasn't ready for this reordering and it was
destroying the dns_request_t object before the send callback has been
called.
Add additional attach/detach in the req_send()/req_senddone() functions
to make sure we don't destroy the dns_request_t while it's still being
references by asynchronous call.
(cherry picked from commit f3ca90a804)
For the reference, the _cancel_lookup() function iterates through
the lookup's queries list and detaches them. In the ideal scenario,
that should be the last reference and the query will be destroyed
after that, but it is also possible that we are still expecting a
callback, which also holds a reference (for example, _cancel_lookup()
could have been called from recv_done(), when send_done() was still
not executed).
The start_udp() and start_tcp() functions are currently designed in
slightly different ways: start_udp() creates a new query attachment
`connectquery`, to be called in the callback function, while
start_tcp() does not, which is a bug, but is hidden by the fact
that when the query is being erroneously destroyed prematurely (before
_cancel_lookup() is called) in the result of that, it also gets
de-listed from the lookup's queries' list, so _cancel_lookup() doesn't
even try to detach it.
For better understanding, here's an illustration of the query's
references count changes, and from where it was changed:
UDP
---
1. _new_query() -> refcount = 1 (initial)
2. start_udp() -> refcount = 2 (lookup->current_query)
3. start_udp() -> refcount = 3 (connectquery)
4. udp_ready() -> refcount = 4 (readquery)
5. udp_ready() -> refcount = 5 (sendquery)
6. udp_ready() -> refcount = 4 (lookup->current_query)
7. udp_ready() -> refcount = 3 (connectquery)
8. send_done() -> refcount = 2 (sendquery)
9. recv_done() -> refcount = 1 (readquery)
10. _cancel_lookup() -> refcount = 0 (initial)
11. the query gets destroyed and removed from `lookup->q`
TCP, fortunate scenario
-----------------------
1. _new_query() -> refcount = 1 (initial)
2. start_tcp() -> refcount = 2 (lookup->current_query)
3. launch_next_query() -> refcount = 3 (readquery)
4. launch_next_query() -> refcount = 4 (sendquery)
5. tcp_connected() -> refcount = 3 (lookup->current_query)
6. tcp_connected() -> refcount = 2 (bug, there was no connectquery)
7. send_done() -> refcount = 1 (sendquery)
8. recv_done() -> refcount = 0 (readquery)
9. the query gets prematurely destroyed and removed from `lookup->q`
10. _cancel_lookup() -> the query is not in `lookup->q`
TCP, unfortunate scenario, revealing the bug
--------------------------------------------
1. _new_query() -> refcount = 1 (initial)
2. start_tcp() -> refcount = 2 (lookup->current_query)
3. launch_next_query() -> refcount = 3 (readquery)
4. launch_next_query() -> refcount = 4 (sendquery)
5. tcp_connected() -> refcount = 3 (lookup->current_query)
6. tcp_connected() -> refcount = 2 (bug, there was no connectquery)
7. recv_done() -> refcount = 1 (readquery)
8. _cancel_lookup() -> refcount = 0 (the query was in `lookup->q`)
9. we hit an assertion here when trying to destroy the query, because
sendhandle is not detached (which is done by send_done()).
10. send_done() -> this never happens
This commit does the following:
1. Add a `connectquery` attachment in start_tcp(), like done in
start_udp().
2. Add missing _cancel_lookup() calls for error scenarios, which
were possibly missing because before fixing the bug, calling
_cancel_lookup() and then calling query_detach() would cause
an assertion.
3. Log a debug message and call isc_nm_cancelread(query->readhandle)
for every query in the lookup from inside the _cancel_lookup()
function, like it is done in _cancel_all().
4. Add a `canceled` property for the query which becomes `true` when
the lookup (and subsequently, its queries) are canceled.
5. Use the `canceled` property in the network manager callbacks to
know that the query was canceled, and act like `eresult` was equal
to `ISC_R_CANCELED`.
(cherry picked from commit 4043fe9090)
There was a missing UNLOCK_LOOKUP in the recv_done() callback when
the operation had been canceled. That omission could result in a
deadlock situation.
(cherry picked from commit 98820aef7e)
For each algorithm there must be a key performing the KSK and
ZSK rolls. After reading the keys from named.conf check that
each algorithm present has both rolls. CSK implicitly has both
rolls.
(cherry picked from commit 9bcf45f4ce)
bad-ksk-without-zsk.conf only has a ksk defined without a
matching zsk for the same algorithm.
bad-zsk-without-ksk.conf only has a zsk defined without a
matching ksk for the same algorithm.
bad-unpaired-keys.conf has two keys of different algorithms
one ksk only and the other zsk only
(cherry picked from commit f23e86b96b)
BIND unconditionally uses shims for BN_GENCB_new(), BN_GENCB_free(),
and BN_GENCB_get_arg() for all LibreSSL versions and, correctly, for
OpenSSL <1.1.0 versions.
This breaks LibreSSL compilation starting with LibreSSL 3.5.0.
Use autoconf check instead to check whether the family of the functions
are available.
(cherry picked from commit 749973f3259b7638a6af02b7da2f40ae28bdd402)
LibreSSL 3.5.0 fails to compile with these shims. We could have just
removed the LibreSSL check from the pre-processor condition, but it
seems that these shims are no longer needed because all the supported
versions of OpenSSL and LibreSSL have those functions.
According to EVP_ENCRYPTINIT(3) manual page in LibreSSL,
EVP_CIPHER_CTX_new() and EVP_CIPHER_CTX_free() first appeared in
OpenSSL 0.9.8b, and have been available since OpenBSD 4.5.
(cherry picked from commit a3789053682b57a2031de8c544134f1923e76cf3)
the "zone" clause can be documented using, for instance,
`cfg_test --zonegrammar primary", which prints only
options that are valid in primary zones. this was not
the method being used when generating the named.conf
man page; instead, "zone" was documented with all possible
options, and no zone types at all.
this commit removes "zone" from the generic documentation
and adds include statements in named.conf.rst so that
correct zone grammars will be included in the man page.
(cherry picked from commit 4ca74eee49)
this is similar to the input found by ClusterFuzz Issue 45027 with
the 0xff characters replaced for readability.
(cherry picked from commit d36938321e)
when parsing key pairs, if the '=' character fell at max_token
a protective INSIST preventing buffer overrun could be triggered.
Attempt to grow the buffer immediately before the INSIST.
Also removed an unnecessary INSIST on the opening double quote
of key buffer pair.
(cherry picked from commit 4c356d2770)
By default C promotes short unsigned values to signed int which
leads to undefined behaviour when the value is shifted by too much.
Force unsigned arithmetic to be perform by explicitly casting to a
unsigned type.
(cherry picked from commit b8b99603f1)
The isc__nmsocket_reset() was missing a case for raw TCP sockets (used
by RNDC and DoH) which would case a assertion failure when write timeout
would be triggered.
TCP sockets are now also properly handled in isc__nmsocket_reset().
(cherry picked from commit b220fb32bd)
we now document zone type as either "primary" or "secondary",
omitting the old terms (though they are still accepted).
(cherry picked from commit 0bde07261b)
"masters" and "default-masters" are now flagged so they will
not be included in the named.conf man page, despite being
accepted as valid options by the parser for backward
compatibiility.
(cherry picked from commit 0e57fc160e)
... along with dns_rdataclass_fromtext and dns_rdatatype_fromtext
Most of the test binary is modified named-rrchecker. Main differences:
- reads single RR and exists
- does not refuse meta classes and rr types
We actually do have some fromtext code for meta-things so erroring out
in named-rrchecker would prevent us from testing this code.
Corpus has examples of all currently supported RR types. I did not do
any minimization.
In future use command
diff -U0 \
<(sed -n -e 's/^.*fromtext_\(.*\)(.*$/\1/p' lib/dns/code.h | \
sort) \
<(ls fuzz/dns_rdata_fromtext.in/)
to check for missing RR types.
(cherry picked from commit dc9ba2d3ef)
Corpus focuses on "extra" things in master files like $GENERATE etc.
Text encoding for RRs is thoroughly tested in dns_rdata_fromtext
fuzzer.
(cherry picked from commit 5076355822)
When isc__nm_uvreq_t gets deactivated, it could be just put onto array
stack to be reused later to save some initialization time.
Unfortunately, this might hide some use-after-free errors.
Disable the inactive uvreqs caching when compiled with Address or
Thread Sanitizer.
(cherry picked from commit be339b3c83)
When isc_nmhandle_t gets deactivated, it could be just put onto array
stack to be reused later to safe some initialization time.
Unfortunately, this might hide some use-after-free errors.
Disable the inactive handles caching when compiled with Address or
Thread Sanitizer.
(cherry picked from commit 92cce1da65)
The isc__nmsocket_t has locked array of isc_nmhandle_t that's not used
for anything. The isc__nmhandle_get() adds the isc_nmhandle_t to the
locked array (and resized if necessary) and removed when
isc_nmhandle_put() finally destroys the handle. That's all it does, so
it serves no useful purpose.
Remove the .ah_handles, .ah_size, and .ah_frees members of the
isc__nmsocket_t and .ah_pos member of the isc_nmhandle_t struct.
(cherry picked from commit e2555a306f)
When the TCP, TCPDNS or TLSDNS connection times out, the isc__nm_uvreq_t
would be pushed into sock->inactivereqs before the uv_tcp_connect()
callback finishes. Because the isc__nmsocket_t keeps the list of
inactive isc__nm_uvreq_t, this would cause use-after-free only when the
sock->inactivereqs is full (which could never happen because the failure
happens in connection timeout callback) or when the sock->inactivereqs
mechanism is completely removed (f.e. when running under Address or
Thread Sanitizer).
Delay isc__nm_uvreq_t deallocation to the connection callback and only
signal the connection callback should be called by shutting down the
libuv socket from the connection timeout callback.
(cherry picked from commit 3268627916)
When the isc_netmgr is being destroyed, the normal and priority queues
should be dequeued and netievents properly freed. This wasn't the case.
(cherry picked from commit 88418c3372)
Commit aab691d512 did not fix all possible
scenarios in which the ns_statscounter_recursclients counter underflows.
The solution implemented therein can be ineffective e.g. when CNAME
chaining happens with prefetching enabled.
Here is an example recursive resolution scenario in which the
ns_statscounter_recursclients counter can underflow with the current
logic in effect:
1. Query processing starts, the answer is not found in the cache, so
recursion is started. The NS_CLIENTATTR_RECURSING attribute is set.
ns_statscounter_recursclients is incremented (Δ = +1).
2. Recursion completes, returning a CNAME. client->recursionquota is
non-NULL, so the NS_CLIENTATTR_RECURSING attribute remains set.
ns_statscounter_recursclients is decremented (Δ = 0).
3. Query processing restarts.
4. The current QNAME (the target of the CNAME from step 2) is found in
the cache, with a TTL low enough to trigger a prefetch.
5. query_prefetch() attaches to client->recursionquota.
ns_statscounter_recursclients is not incremented because
query_prefetch() does not do that (Δ = 0).
6. Query processing restarts.
7. The current QNAME (the target of the CNAME from step 4) is not found
in the cache, so recursion is started. client->recursionquota is
already attached to (since step 5) and the NS_CLIENTATTR_RECURSING
attribute is set (since step 1), so ns_statscounter_recursclients is
not incremented (Δ = 0).
8. The prefetch from step 5 completes. client->recursionquota is
detached from in prefetch_done(). ns_statscounter_recursclients is
not decremented because prefetch_done() does not do that (Δ = 0).
9. Recursion for the current QNAME completes. client->recursionquota
is already detached from, i.e. set to NULL (since step 8), and the
NS_CLIENTATTR_RECURSING attribute is set (since step 1), so
ns_statscounter_recursclients is decremented (Δ = -1).
Another possible scenario is that after step 7, recursion for the target
of the CNAME from step 4 completes before the prefetch for the CNAME
itself. fetch_callback() then notices that client->recursionquota is
non-NULL and decrements ns_statscounter_recursclients, even though
client->recursionquota was attached to by query_prefetch() and therefore
not accompanied by an incrementation of ns_statscounter_recursclients.
The net result is also an underflow.
Instead of trying to properly handle all possible orderings of events
set into motion by normal recursion and prefetch-triggered recursion,
adjust ns_statscounter_recursclients whenever the recursive clients
quota is successfully attached to or detached from. Remove the
NS_CLIENTATTR_RECURSING attribute altogether as its only purpose is made
obsolete by this change.
(cherry picked from commit f7482b68b9)