Commit graph

14242 commits

Author SHA1 Message Date
Ondřej Surý
417097450a Check view->adb in dns_view_flushcache()
The call to dns_view_flushcache() is done under exclusive mode, but we
still need to check if view->adb is still attached before calling
dns_adb_flush() because the shutdown might have been already
initialized.  This most likely only a theoretical problem on shutdown
because there's either no way how to initiate cache flush when shutting
down or very slim window where the `rndc flush` would have to hit the
slim time during named shutdown.
2022-11-11 11:47:44 +01:00
Ondřej Surý
a8ba240325 Don't use view->resolver directly when priming in dns_view_find()
When starting priming from dns_view_find(), the dns_view shutdown could
be initiated by different thread, detaching from the resolver.  Use
dns_view_getresolver() to attach to the resolver under view->lock, so we
don't try to call dns_resolver_prime() with NULL pointer.

There are more accesses to view->resolver, (and also view->adb and
view->requestmgr that suffer from the same problem) in the dns_view
module, but they are all done in exclusive mode or under a view->lock.
2022-11-11 11:47:44 +01:00
Ondřej Surý
e4654d1a6a Bump the allowed HTTP headers in statschannel to 100
Firefox 90+ apparently sends more than 10 headers, so we need to bump
the number to some higher number.  Bump it to 100 just to be on a save
side, this is for internal use only anyway.
2022-11-10 16:34:26 +01:00
Ondřej Surý
b7eabb6394
Use isc_hashmap instead of isc_ht in the dns_resolver API
Replace the use of isc_ht API with isc_hashmap API in the dns_resolver
implementation.  This requires extending the fctxbucket_t structure to
include keysize and copy of the key because the isc_hashmap API needs
the raw key in case of resizing the hashmap table.
2022-11-10 15:07:19 +01:00
Ondřej Surý
e1220a2d4f
Use isc_hashmap instead of isc_ht in the dns_adb API
Replace the use of isc_ht API with isc_hashmap API in the dns_adb
database implementation.  This requires extending the
dns_adbnamebucket_t and dns_adbentrybucket_t structures to include
keysize and copy of the key because the isc_hashmap API needs the raw
key in case of resizing the hashmap table.
2022-11-10 15:07:19 +01:00
Ondřej Surý
f46ce447a6
Add isc_hashmap API that implements Robin Hood hashing
Add new isc_hashmap API that differs from the current isc_ht API in
several aspects:

1. It implements Robin Hood Hashing which is open-addressing hash table
   algorithm (e.g. no linked-lists)

2. No memory allocations - the array to store the nodes is made of
   isc_hashmap_node_t structures instead of just pointers, so there's
   only allocation on resize.

3. The key is not copied into the hashmap node and must be also stored
   externally, either as part of the stored value or in any other
   location that's valid as long the value is stored in the hashmap.

This makes the isc_hashmap_t a little less universal because of the key
storage requirements, but the inserts and deletes are faster because
they don't require memory allocation on isc_hashmap_add() and memory
deallocation on isc_hashmap_delete().
2022-11-10 15:07:19 +01:00
Ondřej Surý
9d2f22e666
Properly name the loop->mctx
The per loop memory context were unnamed, properly name them as
'loop<tid>'.
2022-11-08 13:32:13 +01:00
Mark Andrews
044c3b2bb8 Add missing closing ')' to update-policy documentation
The opening '(' before local was not being matched by a closing
')' after the closing '};'.
2022-11-04 10:37:47 +00:00
Ondřej Surý
96e7bf76e7 Don't release the tree read lock in dereference_iter_node()
Previously, the tree read lock could be upgraded to a write lock in
decrement_reference() and then downgraded back to read lock in
dereference_iter_node().  When the use of isc_rwlock_downgrade() was
removed, the downgrade was changed to a simple unlock+lock. This allows
some delete operations to sneak in and delete nodes that the iterator
expects to be in place.

Expand decrement_reference() so the caller can indicate whether the
tree read lock should be upgraded, and disallow the upgrade when
calling from dereference_iter_node(), so there will be no need to
release the lock afterward.
2022-11-03 14:07:44 +00:00
Ondřej Surý
80e66fbd2d
Don't use dns_zone_attach() in zone_refreshkeys()
The zone_refreshkeys() could run before the zone_shutdown(), but after
the last .erefs has been "detached" causing assertion failure when doing
dns_zone_attach().  Remove the use of .erefs (dns_zone_attach/detach)
and replace it with using the .irefs and additional checks whether the
zone is exiting in the callbacks.
2022-11-03 14:29:32 +01:00
Matthijs Mekking
332b98ae49 Don't allow DNSSEC records in the raw zone
There was an exception for dnssec-policy that allowed DNSSEC in the
unsigned version of the zone. This however causes a crash if the
zone switches from dynamic to inline-signing in the case of NSEC3,
because we are now trying to add an NSEC3 record to a non-NSEC3 node.
This is because BIND expects none of the records in the unsigned
version of the zone to be NSEC3.

Remove the exception for dnssec-policy when copying non DNSSEC
records, but do allow for DNSKEY as this may be a published DNSKEY
from a different provider.
2022-11-03 10:20:05 +01:00
Ondřej Surý
c429b52533
Don't cleanup the dead nodes when pruning the tree
The dead nodes might get reactivated during the db iterator walks the
version of the tree, so we can't cleanup the dead nodes while the db
version is open.  Restore the previous behaviour that cleaned up the
dead nodes when we are closing the version.
2022-11-03 09:06:08 +01:00
Ondřej Surý
be204bf4c7
Cleanup the dead nodes when pruning the tree
While sending the node to prune_tree(), we can also cleanup dead nodes
because we already hold the tree and node bucket write locks.
2022-11-02 13:06:52 +01:00
Ondřej Surý
0492bbf590
Make the pthread_rwlock implementation header-only macros [2/2]
While using mutrace, the phtread-rwlock based isc_rwlock implementation
would be all tracked in the rwlock.c unit losing all useful information
as all rwlocks would be traced in a single place.  Rewrite the
pthread_rwlock based implementation to be header-only macros, so we can
use mutrace to properly track the rwlock contention without heavily
patching mutrace to understand the libisc synchronization primitives.
2022-11-02 10:34:10 +01:00
Ondřej Surý
6bd201ccec
Remove one level of indirection from isc_rwlock [1/2]
Instead of checking the PTHREAD_RUNTIME_CHECK from the header, move it
to the pthread_rwlock implementation functions.  The internal isc_rwlock
actually cannot fail, so the checks in the header was useless anyway.
2022-11-02 10:27:09 +01:00
Ondřej Surý
98b7a93772
Remove isc_rwlock_downgrade() from isc_rwlock
The isc_rwlock_downgrade() is not used anywhere, so we can remove it and
make the pthread_rwlock implementation simpler.
2022-11-02 09:05:37 +01:00
Ondřej Surý
e5f7fe1f65
Add strong rwlock consistency checks to dns_rbtdb
The dns_rbtdb unit already tracks the state of the node and tree rwlocks
during the top level function and passes the states of the locks to the
called functions.

Add the tree locking family of macros modeled after node locking macros,
and expand both to track the state of the lock in an external variable.
Additionally, in developer mode, add precondition to the macros, so the
lock is in required state - this should cause an assertion failure on
double locking instead of the thread getting stuck.
2022-11-02 08:45:48 +01:00
Ondřej Surý
006a7f0cb6
Remove isc_rwlock_downgrade usage in rbtdb.c
The only place where isc_rwlock_downgrade was being used was the
decrement_reference() where the code tries either relocks the node
rwlock to write and then tries to upgrade the tree lock.  When returning
from the function it tries to restore the locks into a previous state
which is nice, but kind of moot, because at every use of
decrement_reference() the node locks is immediately or almost
immeditately unlocked, and same holds for the tree lock.

Instead of trying to restore the node and tree lock into the initial
state, the decrement_reference now returns the state of the locks, so
the caller can then use the right unlock operation (read or write).
Only when the tree lock was originally unlocked, the decrement_reference
unlocks the tree lock before returning to the caller.
2022-11-02 08:45:48 +01:00
Aram Sargsyan
354ae2d7e3 Don't trust a placeholder KEYDATA record
When named starts it creates an empty KEYDATA record in the managed-keys
zone as a placeholder, then schedules a key refresh. If key refresh
fails for some reason (e.g. connectivity problems), named will load the
placeholder key into secroots as a trusted key during the next startup,
which will break the chain of trust, and named will never recover from
that state until managed-keys.bind and managed-keys.bind.jnl files are
manually deleted before (re)starting named again.

Before calling load_secroots(), check that we are not dealing with a
placeholder.
2022-11-01 09:50:34 +00:00
Evan Hunt
31c53235dd Call dns_resolver_createfetch() asynchronously in zone_refreshkeys()
Because dns_resolver_createfetch() locks the view, it was necessary
to unlock the zone in zone_refreshkeys() before calling it in order
to maintain the lock order, and relock afterward. this permitted a race
with dns_zone_synckeyzone().

This commit moves the call to dns_resolver_createfetch() into a separate
function which is called asynchronously after the zone has been
unlocked.

The keyfetch object now attaches to the zone to ensure that
it won't be shut down before the asynchronous call completes.

This necessitated refactoring dns_zone_detach() so it always runs
unlocked. For managed zones it now schedules zone_shutdown() to
run asynchronously, and for unmanaged zones, it requires the last
dns_zone_detach() to be run without loopmgr running.
2022-10-31 14:34:12 -07:00
Evan Hunt
dc878e3098 isc_async_run() runs events in reverse order
when more than one event was scheduled in the isc_aysnc queue,
they were executed in reverse order. we need to pull events
off the back of queue instead the front, so that uv_loop will
run them in the right order.

note that isc_job_run() has the same behavior, because it calls
uv_idle_start() directly. in that case we just document it so
it'll be less surprising in the future.
2022-10-31 05:43:45 -07:00
Ondřej Surý
04670889bc Refactor dns_master_dump*async() to use offloaded work
The dns_master_dump*async() functions were using isc_async_run() to
schedule work on the active loop; use isc_work_enqueue() instead.
2022-10-31 10:30:27 +00:00
Evan Hunt
b54c721894 refactor dns_master_dump*async() to use loop callbacks
Asynchronous zone dumping now uses loop callbacks instead of
task events.
2022-10-31 10:30:27 +00:00
Evan Hunt
f92b946df3 fix a potential data race in zone_maintenance()
zone_maintenance() accessed zone timer information without locking.
2022-10-31 02:54:40 -07:00
Ondřej Surý
77aeed6231 Move the zone loading to the offloaded threads
Instead of doing incremental zone loading with fixed quantum - 100
loaded lines per event, move the zone loading process to the offloaded
libuv threads using isc_work_enqueue() API.

This has the advantage that the thread scheduling is given back to the
operating system that understands blocking operations, and the zone
loading operation doesn't block the networking threads directly.
2022-10-30 14:56:40 -07:00
Evan Hunt
dcc4c3e3ec Refactor dns_master_loadfileinc() to use loopmgr instead of tasks
Incremental file loads now use loopmgr events instead of task events.

The dns_master_loadstreaminc(), _loadbufferinc(), _loadlexer() and
_loadlexerinc() functions were not used in BIND, and have been removed.
2022-10-30 14:56:40 -07:00
Mark Andrews
da6359345e Add check-svcb to named
check-svcb signals whether to perform additional contraint tests
when loading / update primary zone files.
2022-10-29 00:22:54 +11:00
Mark Andrews
3881afeb15 Add dns_rdata_checksvcb
dns_rdata_checksvcb performs data entry checks on SVCB records.
In particular that _dns SVBC record have an 'alpn' and if that 'alpn'
parameter indicates HTTP is in use that 'dophath' is present.
2022-10-29 00:22:54 +11:00
Mark Andrews
f1043f19dd Add dns_name_isdnssvcb
dns_name_isdnssvcb looks for a name which starts with the label
_dns or _<port>._dns labels.
2022-10-29 00:22:54 +11:00
Matthijs Mekking
218c661b41 Fix update forwarding bug
The wrong tls configuration was picked here. It should be of the
primary that is selected by forward->which, not zone->curprimary.

This bug may cause BIND to select the wrong primary when retrieving
the TLS settings, or cause a crash in case the wrongly selected primary
has no TLS settings.
2022-10-27 12:22:23 +02:00
Ondřej Surý
6ba0a22627
Change the return type of isc_lex_create() to void
The isc_lex_create() cannot fail, so cleanup the return type from
isc_result_t to void.
2022-10-26 12:55:06 +02:00
Petr Špaček
baa71c5181
Remove unused lib/dns/tsec
dns_tsec API is not referenced anywhere, remove it.
This is a leftover after dns_client cleanup.

Related: !4835
2022-10-25 10:35:07 +02:00
Evan Hunt
67c0128ebb Fix an error when building with --disable-doh
The netievent handler for isc_nmsocket_set_tlsctx() was inadvertently
ifdef'd out when BIND was built with --disable-doh, resulting in an
assertion failure on startup when DoT was configured.
2022-10-24 13:54:39 -07:00
Aram Sargsyan
863f51466e Match prefetch eligibility behavior with ARM
ARM states that the "eligibility" TTL is the smallest original TTL
value that is accepted for a record to be eligible for prefetching,
but the code, which implements the condition doesn't behave in that
manner for the edge case when the TTL is equal to the configured
eligibility value.

Fix the code to check that the TTL is greater than, or equal to the
configured eligibility value, instead of just greater than it.
2022-10-21 10:19:23 +00:00
Aram Sargsyan
5da79e2be0 Call dns_adb_endudpfetch() on error path, if required
For UDP queries, after calling dns_adb_beginudpfetch() in fctx_query(),
make sure that dns_adb_endudpfetch() is also called on error path, in
order to adjust the quota back.
2022-10-21 08:08:55 +00:00
Aram Sargsyan
e4569373ca Always call dns_adb_endudpfetch() in fctx_cancelquery() for UDP queries
It is currently possible that dns_adb_endudpfetch() is not
called in fctx_cancelquery() for a UDP query, which results
in quotas not being adjusted back.

Always call dns_adb_endudpfetch() for UDP queries.
2022-10-21 08:08:47 +00:00
Aram Sargsyan
ac889684c7 Unlink the query under cleanup_query
In the cleanup code of fctx_query() function there is a code path
where 'query' is linked to 'fctx' and it is being destroyed.

Make sure that 'query' is unlinked before destroying it.
2022-10-21 08:08:37 +00:00
Evan Hunt
305a50dbe1 ensure RPZ lookups handle CD=1 correctly
RPZ rewrites called dns_db_findext() without passing through the
client database options; as as result, if the client set CD=1,
DNS_DBFIND_PENDINGOK was not used as it should have been, and
cache lookups failed, resulting in failure of the rewrite.
2022-10-19 11:36:11 -07:00
Ondřej Surý
13959781cb
Serialize the HTTP/1.1 statschannel requests
The statschannel truncated test still terminates abruptly sometimes and
it doesn't return the answer for the first query.  This might happen
when the second process_request() discovers there's not enough space
before the sending is complete and the connection is terminated before
the client gets the data.

Change the isc_http, so it pauses the reading when it receives the data
and resumes it only after the sending has completed or there's
incomplete request waiting for more data.

This makes the request processing slightly less efficient, but also less
taxing for the server, because previously all requests that has been
received via single TCP read would be processed in the loop and the
sends would be queued after the read callback has processed a full
buffer.
2022-10-19 14:45:36 +02:00
Ondřej Surý
dfaae53b9a
Fix the non-developer build with OpenSSL 1.0.2
In non-developer build, a wrong condition prevented the
isc__tls_malloc_ex, isc__tls_realloc_ex and isc__tls_free_ex to be
defined.  This was causing FTBFS on platforms with OpenSSL 1.0.2.
2022-10-19 14:41:10 +02:00
Artem Boldariev
09dcc914b4 TLS Stream: handle successful TLS handshake after listener shutdown
It was possible that accept callback can be called after listener
shutdown. In such a case the callback pointer equals NULL, leading to
segmentation fault. This commit fixes that.
2022-10-18 18:30:24 +03:00
Matthijs Mekking
a1d57fc8cb Change log level when doing rekey
This log happens when BIND checks the parental-agents if the DS has
been published. But if you don't have parental-agents set up, the list
of keys to check will be empty and the result will be ISC_R_NOTFOUND.
This is not an error, so change the log level to debug in this case.
2022-10-18 16:23:35 +02:00
Ondřej Surý
5e20c2ccfb
Replace (void *)-1 with ISC_LINK_TOMBSTONE
Instead of having "arbitrary" (void *)-1 to define non-linked, add a
ISC_LINK_TOMBSTONE(type) macro that replaces the "magic" value with a
define.
2022-10-18 11:36:15 +02:00
Ondřej Surý
cb3c36b8bf
Add ISC_{LIST,LINK}_INITIALIZER for designated initializers
Since we are using designated initializers, we were missing initializers
for ISC_LIST and ISC_LINK, add them, so you can do

    *foo = (foo_t){ .list = ISC_LIST_INITIALIZER };

Instead of:

    *foo = (foo_t){ 0 };
    ISC_LIST_INIT(foo->list);
2022-10-18 11:36:15 +02:00
Artem Boldariev
5ab2c0ebb3 Synchronise stop listening operation for multi-layer transports
This commit introduces a primitive isc__nmsocket_stop() which performs
shutting down on a multilayered socket ensuring the proper order of
the operations.

The shared data within the socket object can be destroyed after the
call completed, as it is guaranteed to not be used from within the
context of other worker threads.
2022-10-18 12:06:00 +03:00
Tony Finch
26ed03a61e Include the function name when reporting unexpected errors
I.e. print the name of the function in BIND that called the system
function that returned an error. Since it was useful for pthreads
code, it seems worthwhile doing so everywhere.
2022-10-17 13:43:59 +01:00
Tony Finch
a34a2784b1 De-duplicate some calls to strerror_r()
Specifically, when reporting an unexpected or fatal error.
2022-10-17 11:58:26 +01:00
Tony Finch
ec50c58f52 De-duplicate __FILE__, __LINE__
Mostly generated automatically with the following semantic patch,
except where coccinelle was confused by #ifdef in lib/isc/net.c

@@ expression list args; @@
- UNEXPECTED_ERROR(__FILE__, __LINE__, args)
+ UNEXPECTED_ERROR(args)
@@ expression list args; @@
- FATAL_ERROR(__FILE__, __LINE__, args)
+ FATAL_ERROR(args)
2022-10-17 11:58:26 +01:00
Aram Sargsyan
fddaebb285 Handle large numbers when parsing/printing a duration
The isccfg_duration_fromtext() function is truncating large numbers
to 32 bits instead of capping or rejecting them, i.e. 64424509445,
which is 0xf00000005, gets parsed as 32-bit value 5 (0x00000005).

Fail parsing a duration if any of its components is bigger than
32 bits. Using those kind of big numbers has no practical use case
for a duration.

The isccfg_duration_toseconds() function can overflow the 32 bit
seconds variable when calculating the duration from its component
parts.

To avoid that, use 64-bit calculation and return UINT32_MAX if the
calculated value is bigger than UINT32_MAX. Again, a number this big
has no practical use case anyway.

The buffer for the generated duration string is limited to 64 bytes,
which, in theory, is smaller than the longest possible generated
duration string.

Use 80 bytes instead, calculated by the '7 x (10 + 1) + 3' formula,
where '7' is the count of the duration's parts (year, month, etc.), '10'
is their maximum length when printed as a decimal number, '1' is their
indicator character (Y, M, etc.), and 3 is two more indicators (P and T)
and the terminating NUL character.
2022-10-17 08:45:45 +00:00
Aram Sargsyan
dc55f1ebb9 Fix an off-by-one error in cfg_print_duration()
The cfg_print_duration() checks added previously in the 'duration_test'
unit test uncovered a bug in cfg_print_duration().

When calculating the current 'str' pointer of the generated text in the
buffer 'buf', it erroneously adds 1 byte to compensate for that part's
indicator character. For example, to add 12 minutes, it needs to add
2 + 1 = 3 characters, where 2 is the length of "12", and 1 is the length
of "M" (for minute). The mistake was that the length of the indicator
is already included in 'durationlen[i]', so there is no need to
calculate it again.

In the result of this mistake the current pointer can advance further
than needed and end up after the zero-byte instead of right on it, which
essentially cuts off any further generated text. For example, for a
5 minutes and 30 seconds duration, instead of having this:

    'P', 'T', '5', 'M', '3', '0', 'S', '\0'

The function generates this:

    'P', 'T', '5', 'M', '\0', '3', '0', 'S', '\0'

Fix the bug by adding to 'str' just 'durationlen[i]' instead of
'durationlen[i] + 1'.
2022-10-17 08:45:26 +00:00