bind9

mirror of https://github.com/isc-projects/bind9.git synced 2026-02-27 20:11:12 -05:00

Author	SHA1	Message	Date
Ondřej Surý	2011efa99d	Disable safe-guard assertion in DLL_THREAD_ATTACH/DLL_THREAD_DETACH The BIND 9 libraries on Windows define DllMain() optional entry point into a dynamic-link library (DLL). When the system starts or terminates a process or thread, it calls the entry-point function for each loaded DLL using the first thread of the process. When the DLL is being loaded into the virtual address space of the current process as a result of the process starting up, we make a call to DisableThreadLibraryCalls() which should disable the DLL_THREAD_ATTACH and DLL_THREAD_DETACH notifications for the specified dynamic-link library (DLL). This seems not be the case because we never check the return value of the DisableThreadLibraryCalls() call, and it could in fact fail. The DisableThreadLibraryCalls() function fails if the DLL specified by hModule has active static thread local storage, or if hModule is an invalid module handle. In this commit, we remove the safe-guard assertion put in place for the DLL_THREAD_ATTACH and DLL_THREAD_DETACH events and we just ignore them. BIND 9 doesn't create/destroy enough threads for it actually to make any difference, and in fact we do use static thread local storage in the code.	2021-02-26 17:18:06 +01:00
Ondřej Surý	2f0f531ee8	Use library constructor/destructor to initialize OpenSSL Instead of calling isc_tls_initialize()/isc_tls_destroy() explicitly use gcc/clang attributes on POSIX and DLLMain on Windows to initialize and shutdown OpenSSL library. This resolves the issue when isc_nm_create() / isc_nm_destroy() was called multiple times and it would call OpenSSL library destructors from isc_nm_destroy(). At the same time, since we now have introduced the ctor/dtor for libisc, this commit moves the isc_mem API initialization (the list of the contexts) and changes the isc_mem_checkdestroyed() to schedule the checking of memory context on library unload instead of executing the code immediately.	2021-02-26 17:18:06 +01:00
Ondřej Surý	59d51cdeb6	Disable calling DllMain() on thread creation/destruction Disables the DLL_THREAD_ATTACH and DLL_THREAD_DETACH notifications for the specified dynamic-link library (DLL). This can reduce the size of the working set for some applications.	2021-02-26 17:07:15 +01:00
Ondřej Surý	6883d4bd08	Fix the invalid condition variable Although harmless, the memmove() in tlsdns and tcpdns was guarded by a current message length variable that was always bigger than 0 instead of correct current buffer length remainder variable.	2021-02-26 17:07:09 +01:00
Ondřej Surý	6da91bf16c	Move most of the OpenSSL initialization to isc_tls Since we now require both libcrypto and libssl to be initialized for netmgr, we move all the OpenSSL initialization code except the engine initialization to isc_tls API. The isc_tls_initialize() and isc_tls_destroy() has been made idempotent, so they could be called multiple time. However when isc_tls_destroy() has been called, the isc_tls_initialize() could not be called again.	2021-02-26 17:07:01 +01:00
Ondřej Surý	effe3ee595	Refactor TLSDNS module to work with libuv/ssl directly * Following the example set in `634bdfb16d`, the tlsdns netmgr module now uses libuv and SSL primitives directly, rather than opening a TLS socket which opens a TCP socket, as the previous model was difficult to debug. Closes #2335. * Remove the netmgr tls layer (we will have to re-add it for DoH) * Add isc_tls API to wrap the OpenSSL SSL_CTX object into libisc library; move the OpenSSL initialization/deinitialization from dstapi needed for OpenSSL 1.0.x to the isc_tls_{initialize,destroy}() * Add couple of new shims needed for OpenSSL 1.0.x * When LibreSSL is used, require at least version 2.7.0 that has the best OpenSSL 1.1.x compatibility and auto init/deinit * Enforce OpenSSL 1.1.x usage on Windows (cherry picked from commit `e493e04c0f`)	2021-02-26 16:14:50 +01:00
Matthijs Mekking	acc95d4e1d	Don't servfail on staleonly lookups When a staleonly lookup doesn't find a satisfying answer, it should not try to respond to the client. This is not true when the initial lookup is staleonly (that is when 'stale-answer-client-timeout' is set to 0), because no resolver fetch has been created at this point. In this case continue with the lookup normally. (cherry picked from commit `f8b7b597e9`)	2021-02-25 12:07:34 +01:00
Matthijs Mekking	84deb57bc3	Don't allow recursion on staleonly lookups Fix a crash that can happen in the following scenario: A client request is received. There is no data for it in the cache, (not even stale data). A resolver fetch is created as part of recursion. Some time later, the fetch still hasn't completed, and stale-answer-client-timeout is triggered. A staleonly lookup is started. It will also find no data in the cache. So 'query_lookup()' will call 'query_gotanswer()' with ISC_R_NOTFOUND, so this will call 'query_notfound()' and this will start recursion. We will eventually end up in 'ns_query_recurse()' and that requires the client query fetch to be NULL: REQUIRE(client->query.fetch == NULL); If the previously started fetch is still running this assertion fails. The crash is easily prevented by not requiring recursion for staleonly lookups. Also remove a redundant setting of the staleonly flag at the end of 'query_lookup_staleonly()' before destroying the query context. Add a system test to catch this case. (cherry picked from commit `9e061faaae`)	2021-02-25 12:07:27 +01:00
Matthijs Mekking	d51df34995	Fix dnssec-policy NSEC3 on dynamic zones When applying dnssec-policy on a dynamic zone (e.g. that allows Dynamic Updates), the NSEC3 parameters were put on the queue, but they were not being processed (until a reload of the zone or reconfiguration). Process the NSEC3PARAM queue on zone postload when handling a dynamic zone. (cherry picked from commit `4b176c850b`)	2021-02-25 10:56:00 +01:00
Michal Nowak	b36690e750	Initialize checknames field in dns_view_create() The 'checknames' field wasn't initialized in dns_view_create(), but it should otherwise AddressSanitizer identifies the following runtime error in query_test.c. runtime error: load of value 190, which is not a valid value for type '_Bool' (cherry picked from commit `0c6fa16477`)	2021-02-23 16:45:36 +01:00
Matthijs Mekking	8e4d0d51be	Purge keys implementation On each keymgr run, we now also check if key files can be removed. The 'purge-keys' interval determines how long keys should be retained after they have become completely hidden. Key files should not be removed if it has a state that is set to something else then HIDDEN, if purge-keys is 0 (disabled), if the key goal is set to OMNIPRESENT, or if the key is unused (a key is unused if no timing metadata set, and no states are set or if set, they are set to HIDDEN). If the last changed timing metadata plus the purge-keys interval is in the past, the key files may be removed. Add a dst_key_t variable 'purge' to signal that the key file should not be written to file again. (cherry picked from commit `8c526cb67f`)	2021-02-23 09:19:03 +01:00
Matthijs Mekking	45dcabf411	Add purge-keys config option Add a new option 'purge-keys' to 'dnssec-policy' that will purge key files for deleted keys. The option determines how long key files should be retained prior to removing the corresponding files from disk. If set to 0, the option is disabled and 'named' will not remove key files from disk. (cherry picked from commit `313de3a7e2`)	2021-02-23 09:18:55 +01:00
Mark Andrews	4e192d2fe7	Address theoretical resource leak in dns_dt_open() dns_dt_open() is not currently called with mode dns_dtmode_unix. *** CID 281489: Resource leaks (RESOURCE_LEAK) /lib/dns/dnstap.c: 983 in dns_dt_open() 977 978 if (!dnstap_file(handle->reader)) { 979 CHECK(DNS_R_BADDNSTAP); 980 } 981 break; 982 case dns_dtmode_unix: CID 281489: Resource leaks (RESOURCE_LEAK) Variable "handle" going out of scope leaks the storage it points to. 983 return (ISC_R_NOTIMPLEMENTED); 984 default: 985 INSIST(0); 986 ISC_UNREACHABLE(); 987 } 988 (cherry picked from commit `003dd8cc70`)	2021-02-23 09:41:15 +11:00
Mark Andrews	d68b85e555	Correctly detect when get_direction failed (cherry picked from commit `009358d77d`)	2021-02-19 11:39:12 +11:00
Mark Andrews	0cea486327	Test a LOC record with an invalid direction field (cherry picked from commit `07902d9f9d`)	2021-02-19 11:39:12 +11:00
Ondřej Surý	b04cb88462	Fix off-by-one bug in ISC SPNEGO implementation The ISC SPNEGO implementation is based on mod_auth_kerb code. When CVE-2006-5989 was disclosed, the relevant fix was not applied to the BIND 9 codebase, making the latter vulnerable to the aforementioned flaw when "tkey-gssapi-keytab" or "tkey-gssapi-credential" is set in named.conf. The original description of CVE-2006-5989 was: Off-by-one error in the der_get_oid function in mod_auth_kerb 5.0 allows remote attackers to cause a denial of service (crash) via a crafted Kerberos message that triggers a heap-based buffer overflow in the component array. Later research revealed that this flaw also theoretically enables remote code execution, though achieving the latter in real-world conditions is currently deemed very difficult. This vulnerability was responsibly reported as ZDI-CAN-12302 ("ISC BIND TKEY Query Heap-based Buffer Overflow Remote Code Execution Vulnerability") by Trend Micro Zero Day Initiative.	2021-02-17 22:36:08 +01:00
Ondřej Surý	d7b3a6a016	Rollback setting IP_DONTFRAG option on the UDP sockets In DNS Flag Day 2020, the development branch started setting the IP_DONTFRAG option on the UDP sockets. It turned out, that this code was incomplete leading to dropping the outgoing UDP packets. Henceforth this commit rolls back this setting until we have a proper fix that would send back empty response with TC flag set. (cherry picked from commit `66eefac78c`)	2021-02-17 14:41:56 +01:00
Michal Nowak	f483b102dd	Drop USE_OPENSSL constraint from dh_test The USE_OPENSSL constraint in dh_test does not seems to be necessary anymore, the test runs with PKCS#11 as well. (cherry picked from commit `c341e7f740`)	2021-02-17 12:46:25 +01:00
Michal Nowak	ed38e32b69	Ensure dnstap_test returns SKIPPED_TEST_EXIT_CODE Make sure lib/dns/tests/dnstap_test returns an exit code that indicates a skipped test when dnstap is not enabled. (cherry picked from commit `c286341703`)	2021-02-17 12:15:18 +01:00
Michal Nowak	04aff208fb	Use BIND 9.17 preprocessor macro to skip unit test BIND 9.17 changed exit code of skipped test to meet Automake expectations in `fa505bfb0e`. BIND 9.16 was not rewritten to Automake, but for consistency reasons, the same SKIPPED_TEST_EXIT_CODE preprocessor macro is used (though the actual exit code differs from the one in BIND 9.17). (cherry picked from commit `fa505bfb0e`)	2021-02-17 12:09:25 +01:00
Mark Andrews	d51b78c85b	Stop including <gssapi.h> from <dst/gssapi.h> header The only reason for including the gssapi.h from the dst/gssapi.h header was to get the typedefs of gss_cred_id_t and gss_ctx_id_t. Instead of using those types directly this commit introduces dns_gss_cred_id_t and dns_gss_ctx_id_t types that are being used in the public API and privately retyped to their counterparts when we actually call the gss api. This also conceals the gssapi headers, so users of the libdns library doesn't have to add GSSAPI_CFLAGS to the Makefile when including libdns dst API.	2021-02-16 12:08:21 +11:00
Ondřej Surý	4bbe3e75de	Stop including dnstap headers from <dns/dnstap.h> The <fstrm.h> and <protobuf-c/protobuf-c.h> headers are only directly included where used and we stopped exposing those headers from libdns headers.	2021-02-16 12:08:21 +11:00
Mark Andrews	bf5aac225b	Stop including <lmdb.h> from <dns/lmdb.h> The lmdb.h header doesn't have to be included from the dns/lmdb.h header as it can be separately included where used. This stops exposing the inclusion of lmdb.h from the libdns headers.	2021-02-16 12:08:21 +11:00
Mark Andrews	b8fc8742e5	Re-order include directories ${FSTRM_CFLAGS} ${PROTOBUF_C_CFLAGS} ${OPENSSL_CFLAGS} ${LMDB_CFLAGS} need to appear after all directories in the build tree.	2021-02-16 12:08:21 +11:00
Diego Fronza	d89a8bf696	Fix dangling references to outdated views after reconfig This commit fix a leak which was happening every time an inline-signed zone was added to the configuration, followed by a rndc reconfig. During the reconfig process, the secure version of every inline-signed zone was "moved" to a new view upon a reconfig and it "took the raw version along", but only once the secure version was freed (at shutdown) was prev_view for the raw version detached from, causing the old view to be released as well. This caused dangling references to be kept for the previous view, thus keeping all resources used by that view in memory.	2021-02-15 11:52:50 -03:00
Mark Andrews	6e30caed57	Silence Insecure data handling (TAINTED_SCALAR) Coverity assumes that the memory holding any value read using byte swapping is tainted. As we store the NSEC3PARAM records in wire form and iterations is byte swapped the memory holding the record is marked as tainted. nsec3->salt_length is marked as tainted transitively. To remove the taint the value need to be range checked. For a correctly formatted record region.length should match nsec3->salt_length and provides a convenient value to check the field against. *** CID 316507: Insecure data handling (TAINTED_SCALAR) /lib/dns/rdata/generic/nsec3param_51.c: 241 in tostruct_nsec3param() 235 region.length = rdata->length; 236 nsec3param->hash = uint8_consume_fromregion(&region); 237 nsec3param->flags = uint8_consume_fromregion(&region); 238 nsec3param->iterations = uint16_consume_fromregion(&region); 239 240 nsec3param->salt_length = uint8_consume_fromregion(&region); >>> CID 316507: Insecure data handling (TAINTED_SCALAR) >>> Passing tainted expression "nsec3param->salt_length" to "mem_maybedup", which uses it as an offset. 241 nsec3param->salt = mem_maybedup(mctx, region.base, 242 nsec3param->salt_length); 243 if (nsec3param->salt == NULL) { 244 return (ISC_R_NOMEMORY); 245 } 246 isc_region_consume(&region, nsec3param->salt_length); (cherry picked from commit `c40133d840`)	2021-02-12 10:43:19 +11:00
Mark Andrews	8302e9fb69	Silence Untrusted value as argument (TAINTED_SCALAR) Coverity assumes that the memory holding any value read using byte swapping is tainted. As we store the NSEC3 records in wire form and iterations is byte swapped the memory holding the record is marked as tainted. nsec3->salt_length and nsec3->next_length are marked as tainted transitively. To remove the taint the values need to be range checked. Valid values for these should never exceed region.length so that is becomes a reasonable value to check against. *** CID 316509: (TAINTED_SCALAR) /lib/dns/rdata/generic/nsec3_50.c: 312 in tostruct_nsec3() 306 if (nsec3->salt == NULL) { 307 return (ISC_R_NOMEMORY); 308 } 309 isc_region_consume(&region, nsec3->salt_length); 310 311 nsec3->next_length = uint8_consume_fromregion(&region); >>> CID 316509: (TAINTED_SCALAR) >>> Passing tainted expression "nsec3->next_length" to "mem_maybedup", which uses it as an offset. 312 nsec3->next = mem_maybedup(mctx, region.base, nsec3->next_length); 313 if (nsec3->next == NULL) { 314 goto cleanup; 315 } 316 isc_region_consume(&region, nsec3->next_length); 317 /lib/dns/rdata/generic/nsec3_50.c: 305 in tostruct_nsec3() 299 region.length = rdata->length; 300 nsec3->hash = uint8_consume_fromregion(&region); 301 nsec3->flags = uint8_consume_fromregion(&region); 302 nsec3->iterations = uint16_consume_fromregion(&region); 303 304 nsec3->salt_length = uint8_consume_fromregion(&region); >>> CID 316509: (TAINTED_SCALAR) >>> Passing tainted expression "nsec3->salt_length" to "mem_maybedup", which uses it as an offset. 305 nsec3->salt = mem_maybedup(mctx, region.base, nsec3->salt_length); 306 if (nsec3->salt == NULL) { 307 return (ISC_R_NOMEMORY); 308 } 309 isc_region_consume(&region, nsec3->salt_length); 310 (cherry picked from commit `fd8d1337a5`)	2021-02-12 10:43:19 +11:00
Michal Nowak	001413ed50	Drop AddressSanitizer constraint from libns unit tests The AddressSanitizer constraint in some libns unit tests does not seem to be necessary anymore, these tests run fine under AddressSanitizer. (cherry picked from commit `613be8706e`)	2021-02-10 11:03:27 +01:00
Matthijs Mekking	2afaff75ed	Use stale on error also when unable to recurse The 'query_usestale()' function was only called when in 'query_gotanswer()' and an unexpected error occurred. This may have been "quota reached", and thus we were in some cases returning stale data on fetch-limits (and if serve-stale enabled of course). But we can also hit fetch-limits when recursing because we are following a referral (in 'query_notfound()' and 'query_delegation_recurse()'). Here we should also check for using stale data in case an error occurred. Specifically don't check for using stale data when refetching a zero TTL RRset from cache. Move the setting of DNS_DBFIND_STALESTART into the 'query_usestale()' function to avoid code duplication. (cherry picked from commit `8bcd7fe69e`)	2021-02-08 16:10:03 +01:00
Matthijs Mekking	dbf5428629	Only start stale refresh window when resuming If we did not attempt a fetch due to fetch-limits, we should not start the stale-refresh-time window. Introduce a new flag DNS_DBFIND_STALESTART to differentiate between a resolver failure and unexpected error. If we are resuming, this indicates a resolver failure, then start the stale-refresh-time window, otherwise don't start the stale-refresh-time window, but still fall back to using stale data. (This commit also wraps some docstrings to 80 characters width) (cherry picked from commit `aabdedeae3`)	2021-02-08 16:07:43 +01:00
Matthijs Mekking	809ec0a224	Use stale data also if we are not resuming Before this change, BIND will only fallback to using stale data if there was an actual attempt to resolve the query. Then on a timeout, the stale data from cache becomes eligible. This commit changes this so that on any unexpected error stale data becomes eligble (you would still have to have 'stale-answer-enable' enabled of course). If there is no stale data, this may return in an error again, so don't loop on stale data lookup attempts. If the DNS_DBFIND_STALEOK flag is set, this means we already tried to lookup stale data, so if that is the case, don't use stale again. (cherry picked from commit `c6fd02aed5`)	2021-02-08 16:07:43 +01:00
Mark Andrews	8092b7eec6	Remove redundant 'version == NULL' check *** CID 318094: Null pointer dereferences (REVERSE_INULL) /lib/dns/rbtdb.c: 1389 in newversion() 1383 version->xfrsize = rbtdb->current_version->xfrsize; 1384 RWUNLOCK(&rbtdb->current_version->rwlock, isc_rwlocktype_read); 1385 rbtdb->next_serial++; 1386 rbtdb->future_version = version; 1387 RBTDB_UNLOCK(&rbtdb->lock, isc_rwlocktype_write); 1388 CID 318094: Null pointer dereferences (REVERSE_INULL) Null-checking "version" suggests that it may be null, but it has already been dereferenced on all paths leading to the check. 1389 if (version == NULL) { 1390 return (result); 1391 } 1392 1393 *versionp = version; 1394 (cherry picked from commit `456d53d1fb`)	2021-02-08 16:17:52 +11:00
Mark Andrews	a900d79ea8	Cleanup redundant isc_rwlock_init() result checks (cherry picked from commit `3b11bacbb7`)	2021-02-08 15:13:49 +11:00
Mark Andrews	c2a5b88275	Attempt to silence untrusted loop bound Assign hit_len + key_len to len and test the result rather than recomputing and letting the compiler simplify. 213 isc_region_consume(&region, 2); /* hit length + algorithm / 9. tainted_return_value: Function uint16_fromregion returns tainted data. [show details] 10. tainted_data_transitive: Call to function uint16_fromregion with tainted argument region.base returns tainted data. 11. tainted_return_value: Function uint16_fromregion returns tainted data. 12. tainted_data_transitive: Call to function uint16_fromregion with tainted argument *region.base returns tainted data. 13. var_assign: Assigning: key_len = uint16_fromregion(&region), which taints key_len. 214 key_len = uint16_fromregion(&region); 14. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound. 15. Condition key_len == 0, taking false branch. 215 if (key_len == 0) { 216 RETERR(DNS_R_FORMERR); 217 } 16. Condition !!(_r->length >= _l), taking true branch. 17. Condition !!(_r->length >= _l), taking true branch. 218 isc_region_consume(&region, 2); 18. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound. 19. Condition region.length < (unsigned int)(hit_len + key_len), taking false branch. 219 if (region.length < (unsigned)(hit_len + key_len)) { 220 RETERR(DNS_R_FORMERR); 221 } 222 20. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound. 21. Condition _r != 0, taking false branch. 223 RETERR(mem_tobuffer(target, rr.base, 4 + hit_len + key_len)); 22. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound. 23. var_assign_var: Compound assignment involving tainted variable 4 + hit_len + key_len to variable source->current taints source->current. 224 isc_buffer_forward(source, 4 + hit_len + key_len); 225 226 dns_decompress_setmethods(dctx, DNS_COMPRESS_NONE); CID 281461 (#1 of 1): Untrusted loop bound (TAINTED_SCALAR) 24. tainted_data: Using tainted variable source->active - source->current as a loop boundary. Ensure that tainted values are properly sanitized, by checking that their values are within a permissible range. 227 while (isc_buffer_activelength(source) > 0) { 228 dns_name_init(&name, NULL); 229 RETERR(dns_name_fromwire(&name, source, dctx, options, target)); 230 } (cherry picked from commit `2f946c831a`)	2021-02-08 14:05:11 +11:00
Mark Andrews	6da9f238d4	Allow A records below '_spf' labels as recommend by RFC7208 (cherry picked from commit `63c16c8506`)	2021-02-03 16:26:25 +01:00
Matthijs Mekking	ce2a37a990	Use NUM_KEYSTATES constant where appropriate We use the number 4 a lot when working on key states. Better to use the NUM_KEYSTATES constant instead. (cherry picked from commit `98ace6d97d`)	2021-02-03 15:48:20 +01:00
Matthijs Mekking	a8fba11da9	Cleanup keymgr.c Three small cleanups: 1. Remove an unused keystr/dst_key_format. 2. Initialize a dst_key_state_t state with NA. 3. Update false comment about local policy (local policy only adds barrier on transitions to the RUMOURED state, not the UNRETENTIVE state). (cherry picked from commit `189d9a2d21`)	2021-02-03 15:47:40 +01:00
Matthijs Mekking	ceac392e19	Fix DS/DNSKEY hidden or chained functions There was a bug in function 'keymgr_ds_hidden_or_chained()'. The funcion 'keymgr_ds_hidden_or_chained()' implements (3e) of rule2 as defined in the "Flexible and Robust Key Rollover" paper. The rules says: All DS records need to be in the HIDDEN state, or if it is not there must be a key with its DNSKEY and KRRSIG in OMNIPRESENT, and its DS in the same state as the key in question. In human langauge, if all keys have their DS in HIDDEN state you can do what you want, but if a DS record is available to some validators, there must be a chain of trust for it. Note that the barriers on transitions first check if the current state is valid, and then if the next state is valid too. But here we falsely updated the 'dnskey_omnipresent' (now 'dnskey_chained') with the next state. The next state applies to 'key' not to the state to be checked. Updating the state here leads to (true) always, because the key that will move its state will match the falsely updated expected state. This could lead to the assumption that Key 2 would be a valid chain of trust for Key 1, while clearly the presence of any DS is uncertain. The fix here is to check if the DNSKEY and KRRSIG are in OMNIPRESENT state for the key that does not have its DS in the HIDDEN state, and only if that is not the case, ensure that there is a key with the same algorithm, that provides a valid chain of trust, that is, has its DNSKEY, KRRSIG, and DS in OMNIPRESENT state. The changes in 'keymgr_dnskey_hidden_or_chained()' are only cosmetical, renaming 'rrsig_omnipresent' to 'rrsig_chained' and removing the redundant initialization of the DST_KEY_DNSKEY expected state to NA. (cherry picked from commit `291bcc3721`)	2021-02-03 15:47:30 +01:00
Matthijs Mekking	6ff0e99fa7	Update keymgr_key_is_successor() calls The previous commit changed the function definition of 'keymgr_key_is_successor()', this commit updates the code where this function is called. In 'keymgr_key_exists_with_state()' the logic is also updated slightly to become more readable. First handle the easy cases: - If the key does not match the state, continue with the next key. - If we found a key with matching state, and there is no need to check the successor relationship, return (true). - Otherwise check the successor relationship. In 'keymgr_key_has_successor()' it is enough to check if a key has a direct successor, so instead of calling 'keymgr_key_is_successor()', we can just check 'keymgr_direct_dep()'. In 'dns_keymgr_run()', we want to make sure that there is no dependency on the keys before retiring excess keys, so replace 'keymgr_key_is_successor()' with 'keymgr_dep()'. (cherry picked from commit `600915d1b2`)	2021-02-03 15:47:23 +01:00
Matthijs Mekking	5e40515671	Implement Equation(2) of "Flexible Key Rollover" So far the key manager could only deal with two keys in a rollover, because it used a simplified version of the successor relationship equation from "Flexible and Robust Key Rollover" paper. The simplified version assumes only two keys take part in the key rollover and it for that it is enough to check the direct relationship between two keys (is key x the direct predecessor of key z and is key z the direct successor of key x?). But when a third key (or more keys) comes into the equation, the key manager would assume that one key (or more) is redundant and removed it from the zone prematurely. Fix by implementing Equation(2) correctly, where we check for dependencies on keys: z ->T x: Dep(x, T) = ∅ ∧ (x ∈ Dep(z, T) ∨ ∃ y ∈ Dep(z, T)(y != z ∧ y ->T x ∧ DyKyRySy = DzKzRzSz)) This says: key z is a successor of key x if: - key x depends on key z if z is a direct successor of x, - or if there is another key y that depends on key z that has identical key states as key z and key y is a successor of key x. - Also, key x may not have any other keys depending on it. This is still a simplified version of Equation(2) (but at least much better), because the paper allows for a set of keys to depend on a key. This is defined as the set Dep(x, T). Keys in the set Dep(x, T) have a dependency on key x for record type T. The BIND implementation can only have one key in the set Dep(x, T). The function 'keymgr_dep()' stores this key in 'uint32_t *dep' if there is a dependency. There are two scenarios where multiple keys can depend on a single key: 1. Rolling keys is faster than the time required to finish the rollover procedure. This scenario is covered by the recursive implementation, and checking for a chain of direct dependencies will suffice. 2. Changing the policy, when a zone is requested to be signed with a different key length for example. BIND 9 will not mark successor relationships in this case, but tries to move towards the new policy. Since there is no successor relationship, the rules are even more strict, and the DNSSEC reconfiguration is actually slower than required. Note: this commit breaks the build, because the function definition of 'keymgr_key_is_successor' changed. This will be fixed in the following commit. (cherry picked from commit `cc38527b63`)	2021-02-03 15:47:14 +01:00
Matthijs Mekking	3f6dafe1f4	Remove initialize goal code Since keys now have their goals initialized in 'keymgr_key_init()', remove this redundant piece of code in 'keymgr_key_run()'. (cherry picked from commit `82632fa6d9`)	2021-02-03 08:42:51 +01:00
Matthijs Mekking	4170288a91	Correctly initialize old key with state file The 'key_init()' function is used to initialize a state file for keys that don't have one yet. This can happen if you are migrating from a 'auto-dnssec' or 'inline-signing' to a 'dnssec-policy' configuration. It did not look at the "Inactive" and "Delete" timing metadata and so old keys left behind in the key directory would also be considered as a possible active key. This commit fixes this and now explicitly sets the key goal to OMNIPRESENT for keys that have their "Active/Publish" timing metadata in the past, but their "Inactive/Delete" timing metadata in the future. If the "Inactive/Delete" timing metadata is also in the past, the key goal is set to HIDDEN. If the "Inactive/Delete" timing metadata is in the past, also the key states are adjusted to either UNRETENTIVE or HIDDEN, depending on how far in the past the metadata is set. (cherry picked from commit `76cf72e65a`)	2021-02-03 08:42:32 +01:00
Diego Fronza	51663408bc	Fix race condition on check_stale_header This commit fix a race that could happen when two or more threads have failed to refresh the same RRset, the threads could simultaneously attempt to update the header->last_refresh_fail_ts field in check_stale_header, a field used to implement stale-refresh-time. By making this field atomic we avoid such race. (cherry picked from commit `c75575e350`)	2021-01-29 15:29:00 +01:00
Matthijs Mekking	99c72bf5da	Update code flow in query.c wrt stale data First of all, there was a flaw in the code related to the 'stale-refresh-time' option. If stale answers are enabled, and we returned stale data, then it was assumed that it was because we were in the 'stale-refresh-time' window. But now we could also have returned stale data because of a 'stale-answer-client-timeout'. To fix this, introduce a rdataset attribute DNS_RDATASETATTR_STALE_WINDOW to indicate whether the stale cache entry was returned because the 'stale-refresh-time' window is active. Second, remove the special case handling when the result is DNS_R_NCACHENXRRSET. This can be done more generic in the code block when dealing with stale data. Putting all stale case handling in the code block when dealing with stale data makes the code more easy to follow. Update documentation to be more verbose and to match then new code flow. (cherry picked from commit `fa0c9280d2`)	2021-01-29 10:43:41 +01:00
Diego Fronza	0e62c53c5b	Extracted common function from query_lookup and query_refresh_rrset Both functions employed the same code lines to allocate query context buffers, which are used to store query results, so this shared portion of code was extracted out to a new function, qctx_prepare_buffers. Also, this commit uses qctx_init to initialize the query context whitin query_refresh_rrset function. (cherry picked from commit `966060c03b`)	2021-01-29 10:43:27 +01:00
Diego Fronza	5cbb28a40e	Small optimization in query_usestale This commit makes the code in query_usestale easier to follow, it also doesn't attach/detach to the database if stale answers are not enabled. (cherry picked from commit `f89ac07b28`)	2021-01-29 10:41:39 +01:00
Diego Fronza	8324c3ddfe	Allow stale data to be used before name resolution This commit allows stale RRset to be used (if available) for responding a query, before an attempt to refresh an expired, or otherwise resolve an unavailable RRset in cache is made. For that to work, a value of zero must be specified for stale-answer-client-timeout statement. To better understand the logic implemented, there are three flags being used during database lookup and other parts of code that must be understood: . DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a RRset due to timeout (resolver-query-timeout), its intent is to try to look for stale data in cache as a fallback, but only if stale answers are enabled in configuration. This flag is also used to activate stale-refresh-time window, since it is the only way the database knows that a resolution has failed. . DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database that it may use stale data. It is always set during query lookup if stale answers are enabled, but only effectively used during stale-refresh-time window. Also during this window, the resolver will not try to resolve the query, in other words no attempt to refresh the data in cache is made when the stale-refresh-time window is active. . DNS_DBFIND_STALEONLY: This new introduced flag is used when we want stale data from the database, but not due to a failure in resolution, it also doesn't require stale-refresh-time window timer to be active. As long as there is a stale RRset available, it should be returned. It is mainly used in two situations: 1. When stale-answer-client-timeout timer is triggered: in that case we want to know if there is stale data available to answer the client. 2. When stale-answer-client-timeout value is set to zero: in that case, we also want to know if there is some stale RRset available to promptly answer the client. We must also discern between three situations that may happen when resolving a query after the addition of stale-answer-client-timeout statement, and how to handle them: 1. Are we running query_lookup() due to stale-answer-client-timeout timer being triggered? In this case, we look for stale data, making use of DNS_DBFIND_STALEONLY flag. If a stale RRset is available then respond the client with the data found, mark this query as answered (query attribute NS_QUERYATTR_ANSWERED), so when the fetch completes the client won't be answered twice. We must also take care of not detaching from the client, as a fetch will still be running in background, this is handled by the following snippet: if (!QUERY_STALEONLY(&client->query)) { isc_nmhandle_detach(&client->reqhandle); } Which basically tests if DNS_DBFIND_STALEONLY flag is set, which means we are here due to a stale-answer-client-timeout timer expiration. 2. Are we running query_lookup() due to resolver-query-timeout being triggered? In this case, DNS_DBFIND_STALEOK flag will be set and an attempt to look for stale data will be made. As already explained, this flag is algo used to activate stale-refresh-time window, as it means that we failed to refresh a RRset due to timeout. It is ok in this situation to detach from the client, as the fetch is already completed. 3. Are we running query_lookup() during the first time, looking for a RRset in cache and stale-answer-client-timeout value is set to zero? In this case, if stale answers are enabled (probably), we must do an initial database lookup with DNS_DBFIND_STALEONLY flag set, to indicate to the database that we want stale data. If we find an active RRset, proceed as normal, answer the client and the query is done. If we find a stale RRset we respond to the client and mark the query as answered, but don't detach from the client yet as an attempt in refreshing the RRset will still be made by means of the new introduced function 'query_resolve'. If no active or stale RRset is available, begin resolution as usual. (cherry picked from commit `e219422575`)	2021-01-29 10:39:09 +01:00
Diego Fronza	0aebad96b5	Added option for disabling stale-answer-client-timeout This commit allows to specify "disabled" or "off" in stale-answer-client-timeout statement. The logic to support this behavior will be added in the subsequent commits. This commit also ensures an upper bound to stale-answer-client-timeout which equals to one second less than 'resolver-query-timeout'. (cherry picked from commit `0ad6f594f6`)	2021-01-29 10:38:58 +01:00
Diego Fronza	3478794a5d	Add stale-answer-client-timeout option The general logic behind the addition of this new feature works as folows: When a client query arrives, the basic path (query.c / ns_query_recurse) was to create a fetch, waiting for completion in fetch_callback. With the introduction of stale-answer-client-timeout, a new event of type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale answers are enabled and the fetch took longer than stale-answer-client-timeout to complete. When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we must ensure that the folowing happens: 1. Setup a new query context with the sole purpose of looking up for stale RRset only data, for that matters a new flag was added 'DNS_DBFIND_STALEONLY' used in database lookups. . If a stale RRset is found, mark the original client query as answered (with a new query attribute named NS_QUERYATTR_ANSWERED), so when the fetch completion event is received later, we avoid answering the client twice. . If a stale RRset is not found, cleanup and wait for the normal fetch completion event. 2. In ns_query_done, we must change this part: /* * If we're recursing then just return; the query will * resume when recursion ends. */ if (RECURSING(qctx->client)) { return (qctx->result); } To this: if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) { return (qctx->result); } Otherwise we would not proceed to answer the client if it happened that a stale answer was found when looking up for stale only data. When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we proceed as before, resuming query, updating stats, etc, but a few exceptions had to be added, most important of which are two: 1. Before answering the client (ns_client_send), check if the query wasn't already answered before. 2. Before detaching a client, e.g. isc_nmhandle_detach(&client->reqhandle), ensure that this is the fetch completion event, and not the one triggered due to stale-answer-client-timeout, so a correct call would be: if (!QUERY_STALEONLY(client)) { isc_nmhandle_detach(&client->reqhandle); } Other than these notes, comments were added in code in attempt to make these updates easier to follow. (cherry picked from commit `171a5b7542`)	2021-01-29 10:38:32 +01:00
Diego Fronza	7bf8950a0a	Added dns_view_staleanswerenabled() function Since it takes a couple lines of code to check whether stale answers are enabled for a given view, code was extracted out to a proper function. (cherry picked from commit `74840ec50b`)	2021-01-29 10:35:26 +01:00

1 2 3 4 5 ...

12928 commits