1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking
whether the socket was still alive and scheduling reads/sends on
closed socket.
2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been
changed to always return the error conditions via the callbacks, so
they always succeed. This applies to all protocols (UDP, TCP and
TCPDNS).
(cherry picked from commit f7c82e406e)
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication. In this commit, we implement the minimal
required changes by changing the defaults for `edns-udp-size`,
`max-udp-size` and `nocookie-udp-size` to `1232` (the value picked by
DNS Flag Day 2020).
(cherry picked from commit bb990030d3)
As the query_prefetch() or query_rpzfetch() could be called during
"regular" fetch, we need to introduce separate storage for attaching
the nmhandle during prefetching the records. The query_prefetch()
and query_rpzfetch() are guarded for re-entrance by .query.prefetch
member of ns_client_t, so we can reuse the same .prefetchhandle for
both.
(cherry picked from commit d4976e0ebe)
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.
A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:
- reqhandle: held while waiting for a request callback (query,
notify, update)
- sendhandle: held while waiting for a send callback
- fetchhandle: held while waiting for a recursive fetch to
complete
- updatehandle: held while waiting for an update-forwarding
task to complete
(cherry picked from commit 57b4dde974)
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.
in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().
(cherry picked from commit 23c7373d68)
this will allow recv event handlers to distinguish between cases
in which the region is NULL because of error, shutdown, or cancelation.
(cherry picked from commit 75c985c07f)
there is no need for a caller to reference-count socket objects.
they need tto be able tto close listener sockets (i.e., those
returned by isc_nm_listen{udp,tcp,tcpdns}), and an isc_nmsocket_close()
function has been added for that. other sockets are only accessed via
handles.
(cherry picked from commit 9e740cad21)
The following reverted changes will be picked again as part of the
netmgr sync with main branch.
Revert "Merge branch '1996-confidential-issue-v9_16' into 'security-v9_16'"
This reverts commit e160b1509f, reversing
changes made to c01e643715.
Revert "Merge branch '2038-use-freebind-when-bind-fails-v9_16' into 'v9_16'"
This reverts commit 5f8ecfb918, reversing
changes made to 23021385d5.
Revert "Merge branch '1936-blackhole-fix-v9_16' into 'v9_16'"
This reverts commit f20bc90a72, reversing
changes made to 490016ebf1.
Revert "Merge branch '1938-fix-udp-race' into 'v9_16'"
This reverts commit 0a6c7ab2a9, reversing
changes made to 4ea84740e6.
Revert "Merge branch '1947-fix-tcpdns-race' into 'v9_16'"
This reverts commit 4ea84740e6, reversing
changes made to d761cd576b.
The dns_message_create() function cannot soft fail (as all memory
allocations either succeed or cause abort), so we change the function to
return void and cleanup the calls.
(cherry picked from commit 33eefe9f85)
This commit will be used as a base for the next code updates in
order to have a better control of dns_message_t objects' lifetime.
(cherry picked from commit 12d6d13100)
Currently, building BIND using "--without-dlopen" universally breaks
building unit tests which employ the --wrap linker option (because the
replacement functions are put in a shared library and building shared
objects requires "--with-dlopen"). Fix by moving the overridden symbol,
isc_nmhandle_unref(), to lib/ns/tests/nstest.c and dropping
lib/ns/tests/wrap.c altogether. This makes lib/ns/tests/Makefile.in
simpler and prevents --without-dlopen from messing with the process of
building unit tests.
Remove parts of configure.ac which are made redundant by the above
changes.
Put the replacement definition of isc_nmhandle_unref() inside an #ifdef
block, so that the build does not break for non-libtool builds (see
below).
These changes allow the broadest possible set of build variants to work
while also simplifying the build process:
- for libtool builds, overriding isc_nmhandle_unref() is done by
placing that symbol directly in lib/ns/tests/nstest.c and relying on
the dynamic linker to perform symbol resolution in the expected way
when the test binary is run,
- for non-libtool builds, overriding isc_nmhandle_unref() is done
using the --wrap linker option (the libtool approach cannot be used
in this case as multiple strong symbols with the same name cannot
coexist in the same binary),
- the "--without-dlopen" option no longer affects building unit tests.
The typical sequence of events for AAAA queries which trigger recursion
for an A RRset at the same name is as follows:
1. Original query context is created.
2. An AAAA RRset is found in cache.
3. Client-specific data is allocated from the filter-aaaa memory pool.
4. Recursion is triggered for an A RRset.
5. Original query context is torn down.
6. Recursion for an A RRset completes.
7. A second query context is created.
8. Client-specific data is retrieved from the filter-aaaa memory pool.
9. The response to be sent is processed according to configuration.
10. The response is sent.
11. Client-specific data is returned to the filter-aaaa memory pool.
12. The second query context is torn down.
However, steps 6-12 are not executed if recursion for an A RRset is
canceled. Thus, if named is in the process of recursing for A RRsets
when a shutdown is requested, the filter-aaaa memory pool will have
outstanding allocations which will never get released. This in turn
leads to a crash since every memory pool must not have any outstanding
allocations by the time isc_mempool_destroy() is called.
Fix by creating a stub query context whenever fetch_callback() is called,
including cancellation events. When the qctx is destroyed, it will ensure
the client is detached and the plugin memory is freed.
(cherry picked from commit 86eddebc83)
The message buffer passed to ns__client_request is only valid for
the life of the the ns__client_request call. Save a copy of it
when we recurse or process a update as ns__client_request will
return before those operations complete.
(cherry picked from commit f0d9bf7c30)
Now that the log message has been printed set the result code to
DNS_R_FORMERR. We don't do this via dns_result_torcode() as we
don't want upstream errors to produce FORMERR if that processing
end with DNS_R_BADTSIG.
(cherry picked from commit 20488d6ad3)
The basic scenario for the problem was that in the process of
resolving a query, if any rrset was eligible for prefetching, then it
would trigger a call to query_prefetch(), this call would run in
parallel to the normal query processing.
The problem arises due to the fact that both query_prefetch(), and,
in the original thread, a call to ns_query_recurse(), try to attach
to the recursionquota, but recursing client stats counter is only
incremented if ns_query_recurse() attachs to it first.
Conversely, if fetch_callback() is called before prefetch_done(),
it would not only detach from recursionquota, but also decrement
the stats counter, if query_prefetch() attached to te quota first
that would result in a decrement not matched by an increment, as
expected.
To solve this issue an atomic bool was added, it is set once in
ns_query_recurse(), allowing fetch_callback() to check for it
and decrement stats accordingly.
For a more compreensive explanation check the thread comment below:
https://gitlab.isc.org/isc-projects/bind9/-/issues/1719#note_145857
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.
in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().
(cherry picked from commit 23c7373d68)
NS_CLIENT_TCP_BUFFER_SIZE was 2 byte too large following the
move to netmgr add associated changes to lib/ns/client.c and
as a result an INSIST could be trigger if the DNS message being
constructed had a checkpoint stage that fell in those two extra
bytes. Adjusted NS_CLIENT_TCP_BUFFER_SIZE and cleaned up
client_allocsendbuf now that the previously reserved 2 bytes
are no longer used.
(cherry picked from commit 5a92af19b7dce684b0e6670ae6ec1c4c58613263)
- clone keynode->dsset rather than return a pointer so that thread
use is independent of each other.
- hold a reference to the dsset (keynode) so it can't be deleted
while in use.
- create a new keynode when removing DS records so that dangling
pointers to the deleted records will not occur.
- use a rwlock when accessing the rdatalist to prevent instabilities
when DS records are added.
(cherry picked from commit e5b2eca1d3)
Originally, every library and binaries got linked to everything, which
creates unnecessary overlinking. This wasn't as straightforward as it
should be as we still support configuration without libtool for 9.16.
Couple of smaller issues related to include headers and an issue where
sanitizer overload dlopen and dlclose symbols, so we were getting false
negatives in the autoconf test.
Underlinking states for the situation when a binary uses a symbol not provided
by libraries it is directly linked to. The libns was not linked to libisc and
libdns, and libirs was not linked to libisc, libdns and libisccfg) while using
symbols from these libraries directly.
All our MSVS Project files share the same intermediate directory. We
know that this doesn't cause any problems, so we can just disable the
detection in the project files.
Example of the warning:
warning MSB8028: The intermediate directory (.\Release\) contains files shared from another project (dnssectool.vcxproj). This can lead to incorrect clean and rebuild behavior.
(cherry picked from commit b6c2012d93)
Our vcxproj files set the WarningLevel to Level3, which is too verbose
for a code that needs to be portable. That basically leads to ignoring
all the errors that MSVC produces. This commits downgrades the
WarningLevel to Level1 and enables treating warnings as errors for
Release builds. For the Debug builds the WarningLevel got upgraded to
Level4, and treating warnings as errors is explicitly disabled.
We should eventually make the code clean of all MSVC warnings, but it's
a long way to go for Level4, so it's more reasonable to start at Level1.
For reference[1], these are the warning levels as described by MSVC
documentation:
* /W0 suppresses all warnings. It's equivalent to /w.
* /W1 displays level 1 (severe) warnings. /W1 is the default setting
in the command-line compiler.
* /W2 displays level 1 and level 2 (significant) warnings.
* /W3 displays level 1, level 2, and level 3 (production quality)
warnings. /W3 is the default setting in the IDE.
* /W4 displays level 1, level 2, and level 3 warnings, and all level 4
(informational) warnings that aren't off by default. We recommend
that you use this option to provide lint-like warnings. For a new
project, it may be best to use /W4 in all compilations. This option
helps ensure the fewest possible hard-to-find code defects.
* /Wall displays all warnings displayed by /W4 and all other warnings
that /W4 doesn't include — for example, warnings that are off by
default.
* /WX treats all compiler warnings as errors. For a new project, it
may be best to use /WX in all compilations; resolving all warnings
ensures the fewest possible hard-to-find code defects.
1. https://docs.microsoft.com/en-us/cpp/build/reference/compiler-option-warning-level?view=vs-2019
(cherry picked from commit 789d253e3d)
In case of normal fetch, the .recursionquota is attached and
ns_statscounter_recursclients is incremented when the fetch is created. Then
the .recursionquota is detached and the counter decremented in the
fetch_callback().
In case of prefetch or rpzfetch, the quota is attached, but the counter is not
incremented. When we reach the soft-quota, the function returns early but don't
detach from the quota, and it gets destroyed during the ns_client_endrequest(),
so no memory was leaked.
But because the ns_statscounter_recursclients is only incremented during the
normal fetch the counter would be incorrectly decremented on two occassions:
1) When we reached the softquota, because the quota was not properly detached
2) When the prefetch or rpzfetch was cancelled mid-flight and the callback
function was never called.
(cherry picked from commit 78886d4bed)
tcpdns used transport-specific functions to operate on the outer socket.
Use generic ones instead, and select the proper call in netmgr.c.
Make the missing functions (e.g. isc_nm_read) generic and add type-specific
calls (isc__nm_tcp_read). This is the preparation for netmgr TLS layer.
(cherry picked from commit 5fedd21e16)
Updated version and CHANGES files with new release number.
Check the API files:
- lib/bind9/api:
Source code changes, but no interface changes: increment
LIBREVISION.
- lib/dns/api:
Function dns_acl_match changed, struct dns_badcache changed,
function dns_badcache_add changed, function dns_clent_startupdate
changed, struct dns_compress changed, struct dns_resolver changed,
rwlock size changed. This means a LIBINTERFACE increment.
- lib/irs/api:
Source code changes, but no interface changes: increment
LIBREVISION.
- lib/isc/api:
The structs isc__networker and isc_nmsocket changed. This means
increment LIBINTERFACE. The functions isc_uv_export and
isc_uv_import are removed, so LIBAGE must beq zero.
- lib/isccc/api:
Source code changes, but no interface changes: increment
LIBREVISION.
- lib/isccfg/api:
Source code changes, but no interface changes: increment
LIBREVISION.
- lib/ns/api:
Function ns_clientmgr_create, ns_interfacemgr_create, and
structs ns_clientmgr, ns_interface, ns_interfacemgr changed:
increment LIBINTERFACE.
No need to update README or release notes.
Updated CHANGES: Add GitLab MR reference to entry 5357. Remove
merge conflict gone wrong ("max-ixfr-ratio" is not in 9.16).
Add /util/check-make-install.in to .gitattributes.
The isc_mem API now crashes on memory allocation failure, and this is
the next commit in series to cleanup the code that could fail before,
but cannot fail now, e.g. isc_result_t return type has been changed to
void for the isc_log API functions that could only return ISC_R_SUCCESS.
(cherry picked from commit 0b793166d0)
Fixes a race between ns_client_killoldestquery and ns_client_endrequest -
killoldestquery takes a client from `recursing` list while endrequest
destroys client object, then killoldestquery works on a destroyed client
object. Prevent it by holding reclist lock while cancelling query.
(cherry picked from commit df3dbdff81)
When --with-zlib is passed to ./configure (or when the latter
autodetects zlib's presence), libisc uses certain zlib functions and
thus libisc's users should be linked against zlib in that case. Adjust
Makefile variables appropriately to prevent shared build failures caused
by underlinking.
(cherry picked from commit fc967ba092)
Fix a potential assertion failure on shutdown in ns__client_endrequest.
Scenario:
1. We are shutting down, interface->clientmgr is gone.
2. We receive a packet, it gets through ns__client_request
3. mgr == NULL, return
4. isc_nmhandle_detach calls ns_client_reset_cb
5. ns_client_reset_cb calls ns_client_endrequest
6. INSIST(client->state == NS_CLIENTSTATE_WORKING ||
client->state == NS_CLIENTSTATE_RECURSING) is not met
- we haven't started processing this packet so
client->state == NS_CLIENTSTATE_READY.
As a solution - don't do anything in ns_client_reset_cb if the client
is still in READY state.
(cherry picked from commit b0888ff039)
this corrects some style glitches such as:
```
long_function_call(arg, arg2, arg3, arg4, arg5, "str"
"ing");
```
...by adjusting the penalties for breaking strings and call
parameter lists.
(cherry picked from commit 0002377dca)