bind9

mirror of https://github.com/isc-projects/bind9.git synced 2026-04-28 17:46:40 -04:00

Author	SHA1	Message	Date
Matthijs Mekking	194a72b3f1	If RPZ config'd, bail stale-answer-client-timeout When we are recursing, RPZ processing is not allowed. But when we are performing a lookup due to "stale-answer-client-timeout", we are still recursing. This effectively means that RPZ processing is disabled on such a lookup. In this case, bail the "stale-answer-client-timeout" lookup and wait for recursion to complete, as we we can't perform the RPZ rewrite rules reliably. (cherry picked from commit `3d3a6415f7`)	2021-04-02 13:29:27 +02:00
Matthijs Mekking	29bcd113ea	Rename "staleonly" The dboption DNS_DBFIND_STALEONLY caused confusion because it implies we are looking for stale data only and ignore any active RRsets in the cache. Rename it to DNS_DBFIND_STALETIMEOUT as it is more clear the option is related to a lookup due to "stale-answer-client-timeout". Rename other usages of "staleonly", instead use "lookup due to...". Also rename related function and variable names. (cherry picked from commit `839df94190`)	2021-04-02 13:29:17 +02:00
Matthijs Mekking	34dd6521b1	Restore the RECURSIONOK attribute after staleonly When doing a staleonly lookup we don't want to fallback to recursion. After all, there are obviously problems with recursion, otherwise we wouldn't do a staleonly lookup. When resuming from recursion however, we should restore the RECURSIONOK flag, allowing future required lookups for this client to recurse. (cherry picked from commit `3f81d79ffb`)	2021-04-02 13:29:09 +02:00
Matthijs Mekking	114dc7888a	Remove result exception on staleonly lookup When implementing "stale-answer-client-timeout", we decided that we should only return positive answers prematurely to clients. A negative response is not useful, and in that case it is better to wait for the recursion to complete. To do so, we check the result and if it is not ISC_R_SUCCESS, we decide that it is not good enough. However, there are more return codes that could lead to a positive answer (e.g. CNAME chains). This commit removes the exception and now uses the same logic that other stale lookups use to determine if we found a useful stale answer (stale_found == true). This means we can simplify two test cases in the serve-stale system test: nodata.example is no longer treated differently than data.example. (cherry picked from commit `aaed7f9d8c`)	2021-04-02 13:28:59 +02:00
Matthijs Mekking	06823aa255	Remove INSIST on NS_QUERYATTR_ANSWERED The NS_QUERYATTR_ANSWERED attribute is to prevent sending a response twice. Without the attribute, this may happen if a staleonly lookup found a useful answer and sends a response to the client, and later recursion ends and also tries to send a response. The attribute was also used to mask adding a duplicate RRset. This is considered harmful. When we created a response to the client with a stale only lookup (regardless if we actually have send the response), we should clear the rdatasets that were added during that lookup. Mark such rdatasets with the a new attribute, DNS_RDATASETATTR_STALE_ADDED. Set a query attribute NS_QUERYATTR_STALEOK if we may have added rdatasets during a stale only lookup. Before creating a response on a normal lookup, check if we can expect rdatasets to have been added during a staleonly lookup. If so, clear the rdatasets from the message with the attribute DNS_RDATASETATTR_STALE_ADDED set. (cherry picked from commit `3d5429f61f`)	2021-04-02 13:28:08 +02:00
Matthijs Mekking	33d61b9651	Simplify when to detach the client With stale-answer-client-timeout, we may send a response to the client, but we may want to hold on to the network manager handle, because recursion is going on in the background, or we need to refresh a stale RRset. Simplify the setting of 'nodetach': * During a staleonly lookup we should not detach the nmhandle, so just set it prior to 'query_lookup()'. * During a staleonly "stalefirst" lookup set the 'nodetach' to true if we are going to refresh the RRset. Now there is no longer the need to clear the 'nodetach' if we go through the "dbfind_stale", "stale_refresh_window", or "stale_only" paths. (cherry picked from commit `48b0dc159b`)	2021-04-02 13:28:01 +02:00
Matthijs Mekking	b1496d19d5	Refactor stale lookups, ignore active RRsets When doing a staleonly lookup, ignore active RRsets from cache. If we don't, we may add a duplicate RRset to the message, and hit an assertion failure in query.c because adding the duplicate RRset to the ANSWER section failed. This can happen on a race condition. When a client query is received, the recursion is started. When 'stale-answer-client-timeout' triggers around the same time the recursion completes, the following sequence of events may happen: 1. Queue the "try stale" fetch_callback() event to the client task. 2. Add the RRsets from the authoritative response to the cache. 3. Queue the "fetch complete" fetch_callback() event to the client task. 4. Execute the "try stale" fetch_callback(), which retrieves the just-inserted RRset from the database. 5. In "ns_query_done()" we are still recursing, but the "staleonly" query attribute has already been cleared. In other words, the query will resume when recursion ends (it already has ended but is still on the task queue). 6. Execute the "fetch complete" fetch_callback(). It finds the answer from recursion in the cache again and tries to add the duplicate to the answer section. This commit changes the logic for finding stale answers in the cache, such that on "stale_only" lookups actually only stale RRsets are considered. It refactors the code so that code paths for "dbfind_stale", "stale_refresh_window", and "stale_only" are more clear. First we call some generic code that applies in all three cases, formatting the domain name for logging purposes, increment the trystale stats, and check if we actually found stale data that we can use. The "dbfind_stale" lookup will return SERVFAIL if we didn't found a usable answer, otherwise we will continue with the lookup (query_gotanswer()). This is no different as before the introduction of "stale-answer-client-timeout" and "stale-refresh-time". The "stale_refresh_window" lookup is similar to the "dbfind_stale" lookup: return SERVFAIL if we didn't found a usable answer, otherwise continue with the lookup (query_gotanswer()). Finally the "stale_only" lookup. If the "stale_only" lookup was triggered because of an actual client timeout (stale-answer-client-timeout > 0), and if database lookup returned a stale usable RRset, trigger a response to the client. Otherwise return and wait until the recursion completes (or the resolver query times out). If the "stale_only" lookup is a "stale-anwer-client-timeout 0" lookup, preferring stale data over a lookup. In this case if there was no stale data, or the data was not a positive answer, retry the lookup with the stale options cleared, a.k.a. a normal lookup. Otherwise, continue with the lookup (query_gotanswer()) and refresh the stale RRset. This will trigger a response to the client, but will not detach the handle because a fetch will be created to refresh the RRset. (cherry picked from commit `92f7a67892`)	2021-04-02 13:27:52 +02:00
Matthijs Mekking	fcf8fb4f39	Keep track of allow client detach The stale-answer-client-timeout feature introduced a dependancy on when a client may be detached from the handle. The dboption DNS_DBFIND_STALEONLY was reused to track this attribute. This overloads the meaning of this database option, and actually introduced a bug because the option was checked in other places. In particular, in 'ns_query_done()' there is a check for 'RECURSING(qctx->client) && (!QUERY_STALEONLY(&qctx->client->query) \|\| ...' and the condition is satisfied because recursion has not completed yet and DNS_DBFIND_STALEONLY is already cleared by that time (in query_lookup()), because we found a useful answer and we should detach the client from the handle after sending the response. Add a new boolean to the client structure to keep track of client detach from handle is allowed or not. It is only disallowed if we are in a staleonly lookup and we didn't found a useful answer. (cherry picked from commit `fee164243f`)	2021-04-02 13:27:43 +02:00
Ondřej Surý	565a6a5679	Move the dummy shims to single ifndef GSSAPI block Previously, every function had it's own #ifdef GSSAPI #else #endif block that defined shim function in case GSSAPI was not being used. Now the dummy shim functions have be split out into a single #else #endif block at the end of the file. This makes the gssapictx.c similar to 9.17.x code, making the backports and reviews easier.	2021-04-01 10:42:32 +02:00
Mark Andrews	3fd30e1634	Add Heimdal compatibility support The Heimdal Kerberos library handles the OID sets in a different manner. Unify the handling of the OID sets between MIT and Heimdal implementations by dynamically creating the OID sets instead of using static predefined set. This is how upstream recommends to handle the OID sets.	2021-04-01 10:42:32 +02:00
Mark Andrews	a875dcc669	Remove custom ISC SPNEGO implementation The custom ISC SPNEGO mechanism implementation is no longer needed on the basis that all major Kerberos 5/GSSAPI (mit-krb5, heimdal and Windows) implementations support SPNEGO mechanism since 2006. This commit removes the custom ISC SPNEGO implementation, and removes the option from both autoconf and win32 Configure script. Unknown options are being ignored, so this doesn't require any special handling.	2021-04-01 10:42:32 +02:00
Ondřej Surý	ee7283b3ee	Merge branch 'bind-dyndb-ldap-v9.16.13' into 'main' Do not require config.h to use isc/util.h See merge request isc-projects/bind9!4840 (cherry picked from commit `19b69e9a3b`) `81eb3396` Do not require config.h to use isc/util.h	2021-03-26 18:48:06 +00:00
Matthijs Mekking	1f8c5786f8	Delete CDS/CDNSKEY records when zone is unsigned CDS/CDNSKEY DELETE records are only useful if they are signed, otherwise the parent cannot verify these RRsets anyway. So once the DS has been removed (and signaled to BIND), we can remove the DNSKEY and RRSIG records, and at this point we can also remove the CDS/CDNSKEY records. (cherry picked from commit `6f31f62d69`)	2021-03-22 13:57:10 +01:00
Matthijs Mekking	7882c7fbea	Allow CDS/CDNSKEY DELETE records in unsigned zone While not useful, having a CDS/CDNSKEY DELETE record in an unsigned zone is not an error and "named-checkzone" should not complain. (cherry picked from commit `f211c7c2a1`)	2021-03-22 13:31:02 +01:00
Matthijs Mekking	b81502f4ae	Fix keymgr key init bug The 'keymgr_key_init()' function initializes key states if they have not been set previously. It looks at the key timing metadata and determines using the given times whether a state should be set to RUMOURED or OMNIPRESENT. However, the DNSKEY and ZRRSIG states were mixed up: When looking at the Activate timing metadata we should set the ZRRSIG state, and when looking at the Published timing metadata we should set the DNSKEY state. (cherry picked from commit `27e7d5f698`)	2021-03-22 11:24:55 +01:00
Patrick McLean	c5c9c9b83f	Add isc_time_now_hires function to get current time with high resolution The current isc_time_now uses CLOCK_REALTIME_COARSE which only updates on a timer tick. This clock is generally fine for millisecond accuracy, but on servers with 100hz clocks, this clock is nowhere near accurate enough for microsecond accuracy. This commit adds a new isc_time_now_hires function that uses CLOCK_REALTIME, which gives the current time, though it is somewhat expensive to call. When microsecond accuracy is required, it may be required to use extra resources for higher accuracy. (cherry picked from commit `ebced74b19`)	2021-03-20 11:59:21 -07:00
Witold Kręcicki	a6c4702796	Fix a startup/shutdown crash in ns_clientmgr_create	2021-03-18 15:33:28 -03:00
Witold Kręcicki	dd564da286	Shutdown interface if we can't listen on it to avoid shutdown hang	2021-03-18 15:27:28 -03:00
Ondřej Surý	121641686c	Temporarily disable tlsdns_test until it gets refactored The tlsdns API is not yet used in the 9.16 branch and the tlsdns_test fails too often. Temporarily disable running the test until it is actually needed.	2021-03-18 15:42:03 +01:00
Ondřej Surý	db49ffca20	Change the isc_nm_(get\|set)timeouts() to work with milliseconds The RFC7828 specifies the keepalive interval to be 16-bit, specified in units of 100 milliseconds and the configuration options tcp-*-timeouts are following the suit. The units of 100 milliseconds are very unintuitive and while we can't change the configuration and presentation format, we should not follow this weird unit in the API. This commit changes the isc_nm_(get\|set)timeouts() functions to work with milliseconds and convert the values to milliseconds before passing them to the function, not just internally.	2021-03-18 15:16:13 +01:00
Ondřej Surý	5d0647e067	Merge the common parts between udp, tcpdns and tlsdns protocol The udp, tcpdns and tlsdns contained lot of cut&paste code or code that was very similar making the stack harder to maintain as any change to one would have to be copied to the the other protocols. In this commit, we merge the common parts into the common functions under isc__nm_<foo> namespace and just keep the little differences based on the socket type.	2021-03-18 15:16:13 +01:00
Ondřej Surý	a017ba2615	Fix TCPDNS and TLSDNS timers After the TCPDNS refactoring the initial and idle timers were broken and only the tcp-initial-timeout was always applied on the whole TCP connection. This broke any TCP connection that took longer than tcp-initial-timeout, most often this would affect large zone AXFRs. This commit changes the timeout logic in this way: * On TCP connection accept the tcp-initial-timeout is applied and the timer is started * When we are processing and/or sending any DNS message the timer is stopped * When we stop processing all DNS messages, the tcp-idle-timeout is applied and the timer is started again	2021-03-18 15:16:13 +01:00
Mark Andrews	3f9d7227b3	Stop using deprecated calls in lib/isc/tls.c from Rosen Penev @neheb (cherry picked from commit `a9f883cbc2`)	2021-03-18 10:15:44 +11:00
Matthijs Mekking	1a29d7ae7d	Fully sign a thawed zone When thawing a zone, we don't know what changes have been made. If we do DNSSEC maintenance on this zone, schedule a full sign. (cherry picked from commit `b90846f222`)	2021-03-17 11:12:39 +01:00
Matthijs Mekking	3ae16d0480	Fix "unable to thaw dynamic kasp zone" Dynamic zones with dnssec-policy could not be thawed because KASP zones were considered always dynamic. But a dynamic KASP zone should also check whether updates are disabled. (cherry picked from commit `b518ed9f46`)	2021-03-17 11:12:29 +01:00
Matthijs Mekking	96953fc293	Fix servestale fetchlimits crash When we query the resolver for a domain name that is in the same zone for which is already one or more fetches outstanding, we could potentially hit the fetch limits. If so, recursion fails immediately for the incoming query and if serve-stale is enabled, we may try to return a stale answer. If the resolver is also is authoritative for the parent zone (for example the root zone), first a delegation is found, but we first check the cache for a better response. Nothing is found in the cache, so we try to recurse to find the answer to the query. Because of fetch-limits 'dns_resolver_createfetch()' returns an error, which 'ns_query_recurse()' propagates to the caller, 'query_delegation_recurse()'. Because serve-stale is enabled, 'query_usestale()' is called, setting 'qctx->db' to the cache db, but leaving 'qctx->version' untouched. Now 'query_lookup()' is called to search for stale data in the cache database with a non-NULL 'qctx->version' (which is set to a zone db version), and thus we hit an assertion in rbtdb. This crash was introduced in 'v9_16' by commit `2afaff75ed`. (cherry picked from commit `87591de6f7`)	2021-03-11 13:47:20 +01:00
Evan Hunt	401642cbb4	fix automatic journal upgrade on windows - use a value less than 2^32 for DNS_ZONEFLG_FIXJOURNAL; a larger value could cause problems in some build environments. the zone flag DNS_ZONEFLG_DIFFONRELOAD, which was no longer in use, has now been deleted and its value reused for _FIXJOURNAL. (cherry picked from commit `990dd9dbff`)	2021-03-08 09:43:00 +01:00
Evan Hunt	2aed5c5954	add start and end serial numbers to 'named-printjournal -x' add the starting and ending serial number from the journal header to the output of dns_journal_print(). (cherry picked from commit `46c11726d4`)	2021-03-08 09:43:00 +01:00
Mark Andrews	898e9989f9	Silence CID 329159: Logically dead code in lib/dns/journal.c *** CID 329159: (DEADCODE) /lib/dns/journal.c: 1719 in dns_journal_print() 1713 } 1714 CHECK(dns_difftuple_create( 1715 diff.mctx, n_soa == 1 ? DNS_DIFFOP_DEL : DNS_DIFFOP_ADD, 1716 name, ttl, rdata, &tuple)); 1717 dns_diff_append(&diff, &tuple); 1718 >>> CID 329159: (DEADCODE) >>> Execution cannot reach the expression "printxhdr" inside this statement: "if (++n_put != 0U \|\| printx...". 1719 if (++n_put != 0 \|\| printxhdr) { 1720 result = dns_diff_print(&diff, file); 1721 dns_diff_clear(&diff); 1722 n_put = 0; 1723 if (result != ISC_R_SUCCESS) { 1724 break; /lib/dns/journal.c: 1734 in dns_journal_print() 1728 if (result == ISC_R_NOMORE) { 1729 result = ISC_R_SUCCESS; 1730 } 1731 CHECK(result); 1732 1733 if (!printxhdr && n_put != 0) { >>> CID 329159: (DEADCODE) >>> Execution cannot reach this statement: "result = dns_diff_print(&di...". 1734 result = dns_diff_print(&diff, file); 1735 dns_diff_clear(&diff); 1736 } 1737 goto cleanup; 1738 1739 failure: (cherry picked from commit `848e336db4`)	2021-03-08 16:44:43 +11:00
Mark Andrews	fb7e38cca2	Silence CID 329157: Dereference before null check in lib/dns/journal.c *** CID 329157: Null pointer dereferences (REVERSE_INULL) /lib/dns/journal.c: 754 in journal_open() 748 j->header.index_size * sizeof(journal_rawpos_t)); 749 } 750 if (j->index != NULL) { 751 isc_mem_put(j->mctx, j->index, 752 j->header.index_size * sizeof(journal_pos_t)); 753 } CID 329157: Null pointer dereferences (REVERSE_INULL) Null-checking "j->filename" suggests that it may be null, but it has already been dereferenced on all paths leading to the check. 754 if (j->filename != NULL) { 755 isc_mem_free(j->mctx, j->filename); 756 } 757 if (j->fp != NULL) { 758 (void)isc_stdio_close(j->fp); 759 } (cherry picked from commit `4054405909`)	2021-03-08 16:12:01 +11:00
Ondřej Surý	1c2e018962	Assigning uint64_t from buffer might be misaligned in netmgr tests Resolve possible 8-byte unaligned access when assigning the magic value from the received buffer. (cherry picked from commit `a55bdb28f9`)	2021-03-04 15:29:44 +01:00
Mark Andrews	985566fea5	Fixed librpz_dnsrpzd_path being a duplicate symbol librpz_dnsrpzd_path should have been declared extern in dns/librpz.h (cherry picked from commit `8f016dd57a`)	2021-03-04 11:28:49 +01:00
Evan Hunt	9f1f5abe0e	print journal index data and test for consistency 'named-journalprint -x' now prints the journal's index table and the offset of each transaction in the journal, so that index consistency can be confirmed. (cherry picked from commit `a4972324a6`)	2021-03-03 19:19:50 -08:00
Mark Andrews	5aea511e1b	extend named-journalprint to be able to force the journal version named-journalprint can now upgrade or downgrade a journal file in place; the '-u' option upgrades and the '-d' option downgrades. (cherry picked from commit `fb2d0e2897`)	2021-03-03 19:19:50 -08:00
Evan Hunt	47a274e9f1	allow dns_journal_rollforward() to read old journal files when the 'max-ixfr-ratio' option was added, journal transaction headers were revised to include a count of RR's in each transaction. this made it impossible to read old journal files after an upgrade. this branch restores the ability to read version 1 transaction headers. when rolling forward, printing journal contents, if the wrong transaction header format is found, we can switch. when dns_journal_rollforward() detects a version 1 transaction header, it returns DNS_R_RECOVERABLE. this triggers zone_postload() to force a rewrite of the journal file in the new format, and also to schedule a dump of the zone database with minimal delay. journal repair is done by dns_journal_compact(), which rewrites the entire journal, ignoring 'max-journal-size'. journal size is corrected later. newly created journal files now have "BIND LOG V9.2" in their headers instead of "BIND LOG V9". files with the new version string cannot be read using the old transaction header format. note that this means newly created journal files will be rejected by older versions of named. named-journalprint now takes a "-x" option, causing it to print transaction header information before each delta, including its format version. (cherry picked from commit `ee19966326`)	2021-03-03 19:19:50 -08:00
Mark Andrews	7a703244ed	Address theoretical buffer overrun in recent change The strlcat() call was wrong. *** CID 316608: Memory - corruptions (OVERRUN) /lib/dns/resolver.c: 5017 in fctx_create() 5011 * Make fctx->info point to a copy of a formatted string 5012 * "name/type". 5013 / 5014 dns_name_format(name, buf, sizeof(buf)); 5015 dns_rdatatype_format(type, typebuf, sizeof(typebuf)); 5016 p = strlcat(buf, "/", sizeof(buf)); >>> CID 316608: Memory - corruptions (OVERRUN) >>> Calling "strlcat" with "buf + p" and "1036UL" is suspicious because "buf" points into a buffer of 1036 bytes and the function call may access "(char )(buf + p) + 1035UL". [Note: The source code implementation of the function has been overridden by a builtin model.] 5017 strlcat(buf + p, typebuf, sizeof(buf)); 5018 fctx->info = isc_mem_strdup(mctx, buf); 5019 5020 FCTXTRACE("create"); 5021 dns_name_init(&fctx->name, NULL); 5022 dns_name_dup(name, mctx, &fctx->name); (cherry picked from commit `59bf6e71e2`)	2021-03-03 10:55:38 +01:00
Ondřej Surý	59d92c8af8	Call isc__initialize()/isc__shutdown() from win32 DllMain Call the libisc isc__initialize() constructor and isc__shutdown() destructor from DllMain instead of having duplicate code between those and DllMain() code. (cherry picked from commit `a50f5d0cf5`)	2021-03-02 08:50:29 +01:00
Ondřej Surý	86298fd112	Make linker include library ctor/dtor in statically linked libisc Under normal situation, the linker throws out all symbols from compilation unit when no symbols are used in the final binary, which is the case for lib/isc/lib.c. This commit adds empty function to lib.c that's being called from different CU (mem.c in this case) and that makes the linker to include all the symbols including the normally unreferenced isc__initialize() and isc__shutdown() in the final binary.	2021-02-26 21:19:33 +01:00
Ondřej Surý	f92b77ff0d	Change the isc_thread_self() return type to uintptr_t The pthread_self(), thrd_current() or GetCurrentThreadId() could actually be a pointer, so we should rather convert the value into uintptr_t instead of unsigned long. (cherry picked from commit `a0181056a8`)	2021-02-26 21:14:17 +01:00
Ondřej Surý	0e90858d99	Use globally assigned thread_id in the isc_hp API Convert the isc_hp API to use the globally available isc_tid_v instead of locally defined tid_v. This should solve most of the problems on machines with many number of cores / CPUs. (cherry picked from commit `bea333f7c9`)	2021-02-26 21:14:17 +01:00
Ondřej Surý	45c55b1e7e	Add isc_trampoline API to have simple accounting around threads The current isc_hp API uses internal tid_v variable that gets incremented for each new thread using hazard pointers. This tid_v variable is then used as a index to global shared table with hazard pointers state. Since the tid_v is only incremented and never decremented the table could overflow very quickly if we create set of threads for short period of time, they finish the work and cease to exist. Then we create identical set of threads and so on and so on. This is not a problem for a normal `named` operation as the set of threads is stable, but the problematic place are the unit tests where we test network manager or other APIs (task, timer) that create threads. This commits adds a thin wrapper around any function called from isc_thread_create() that adds unique-but-reusable small digit thread id that can be used as index to f.e. hazard pointer tables. The trampoline wrapper ensures that the thread ids will be reused, so the highest thread_id number doesn't grow indefinitely when threads are created and destroyed and then created again. This fixes the hazard pointer table overflow on machines with many cores. [GL #2396] (cherry picked from commit `cbbecfcc82`)	2021-02-26 21:14:17 +01:00
Ondřej Surý	0bd834856e	Enforce disabling internal memory allocator for AddressSanitizer Disable the internal memory allocator when AddressSanitizer is in use. The basic blocks in the internal memory allocator prevents AddressSanitizer from properly tracking the allocations and deallocations, so we need to ensure it has been disabled for any build that has AddressSanitizer enabled.	2021-02-26 10:13:56 -08:00
Ondřej Surý	1cd14c4dec	Add mempool get/put tracking with AddressSanitizer When AddressSanitizer is in use, disable the internal mempool implementation and redirect the isc_mempool_get to isc_mem_get (and similarly for isc_mempool_put). This is the method recommended by the AddressSanitizer authors for tracking allocations and deallocations instead of custom poison/unpoison code (see https://github.com/google/sanitizers/wiki/AddressSanitizerManualPoisoning).	2021-02-26 10:13:17 -08:00
Ondřej Surý	2011efa99d	Disable safe-guard assertion in DLL_THREAD_ATTACH/DLL_THREAD_DETACH The BIND 9 libraries on Windows define DllMain() optional entry point into a dynamic-link library (DLL). When the system starts or terminates a process or thread, it calls the entry-point function for each loaded DLL using the first thread of the process. When the DLL is being loaded into the virtual address space of the current process as a result of the process starting up, we make a call to DisableThreadLibraryCalls() which should disable the DLL_THREAD_ATTACH and DLL_THREAD_DETACH notifications for the specified dynamic-link library (DLL). This seems not be the case because we never check the return value of the DisableThreadLibraryCalls() call, and it could in fact fail. The DisableThreadLibraryCalls() function fails if the DLL specified by hModule has active static thread local storage, or if hModule is an invalid module handle. In this commit, we remove the safe-guard assertion put in place for the DLL_THREAD_ATTACH and DLL_THREAD_DETACH events and we just ignore them. BIND 9 doesn't create/destroy enough threads for it actually to make any difference, and in fact we do use static thread local storage in the code.	2021-02-26 17:18:06 +01:00
Ondřej Surý	2f0f531ee8	Use library constructor/destructor to initialize OpenSSL Instead of calling isc_tls_initialize()/isc_tls_destroy() explicitly use gcc/clang attributes on POSIX and DLLMain on Windows to initialize and shutdown OpenSSL library. This resolves the issue when isc_nm_create() / isc_nm_destroy() was called multiple times and it would call OpenSSL library destructors from isc_nm_destroy(). At the same time, since we now have introduced the ctor/dtor for libisc, this commit moves the isc_mem API initialization (the list of the contexts) and changes the isc_mem_checkdestroyed() to schedule the checking of memory context on library unload instead of executing the code immediately.	2021-02-26 17:18:06 +01:00
Ondřej Surý	59d51cdeb6	Disable calling DllMain() on thread creation/destruction Disables the DLL_THREAD_ATTACH and DLL_THREAD_DETACH notifications for the specified dynamic-link library (DLL). This can reduce the size of the working set for some applications.	2021-02-26 17:07:15 +01:00
Ondřej Surý	6883d4bd08	Fix the invalid condition variable Although harmless, the memmove() in tlsdns and tcpdns was guarded by a current message length variable that was always bigger than 0 instead of correct current buffer length remainder variable.	2021-02-26 17:07:09 +01:00
Ondřej Surý	6da91bf16c	Move most of the OpenSSL initialization to isc_tls Since we now require both libcrypto and libssl to be initialized for netmgr, we move all the OpenSSL initialization code except the engine initialization to isc_tls API. The isc_tls_initialize() and isc_tls_destroy() has been made idempotent, so they could be called multiple time. However when isc_tls_destroy() has been called, the isc_tls_initialize() could not be called again.	2021-02-26 17:07:01 +01:00
Ondřej Surý	effe3ee595	Refactor TLSDNS module to work with libuv/ssl directly * Following the example set in `634bdfb16d`, the tlsdns netmgr module now uses libuv and SSL primitives directly, rather than opening a TLS socket which opens a TCP socket, as the previous model was difficult to debug. Closes #2335. * Remove the netmgr tls layer (we will have to re-add it for DoH) * Add isc_tls API to wrap the OpenSSL SSL_CTX object into libisc library; move the OpenSSL initialization/deinitialization from dstapi needed for OpenSSL 1.0.x to the isc_tls_{initialize,destroy}() * Add couple of new shims needed for OpenSSL 1.0.x * When LibreSSL is used, require at least version 2.7.0 that has the best OpenSSL 1.1.x compatibility and auto init/deinit * Enforce OpenSSL 1.1.x usage on Windows (cherry picked from commit `e493e04c0f`)	2021-02-26 16:14:50 +01:00
Matthijs Mekking	acc95d4e1d	Don't servfail on staleonly lookups When a staleonly lookup doesn't find a satisfying answer, it should not try to respond to the client. This is not true when the initial lookup is staleonly (that is when 'stale-answer-client-timeout' is set to 0), because no resolver fetch has been created at this point. In this case continue with the lookup normally. (cherry picked from commit `f8b7b597e9`)	2021-02-25 12:07:34 +01:00
Matthijs Mekking	84deb57bc3	Don't allow recursion on staleonly lookups Fix a crash that can happen in the following scenario: A client request is received. There is no data for it in the cache, (not even stale data). A resolver fetch is created as part of recursion. Some time later, the fetch still hasn't completed, and stale-answer-client-timeout is triggered. A staleonly lookup is started. It will also find no data in the cache. So 'query_lookup()' will call 'query_gotanswer()' with ISC_R_NOTFOUND, so this will call 'query_notfound()' and this will start recursion. We will eventually end up in 'ns_query_recurse()' and that requires the client query fetch to be NULL: REQUIRE(client->query.fetch == NULL); If the previously started fetch is still running this assertion fails. The crash is easily prevented by not requiring recursion for staleonly lookups. Also remove a redundant setting of the staleonly flag at the end of 'query_lookup_staleonly()' before destroying the query context. Add a system test to catch this case. (cherry picked from commit `9e061faaae`)	2021-02-25 12:07:27 +01:00
Matthijs Mekking	d51df34995	Fix dnssec-policy NSEC3 on dynamic zones When applying dnssec-policy on a dynamic zone (e.g. that allows Dynamic Updates), the NSEC3 parameters were put on the queue, but they were not being processed (until a reload of the zone or reconfiguration). Process the NSEC3PARAM queue on zone postload when handling a dynamic zone. (cherry picked from commit `4b176c850b`)	2021-02-25 10:56:00 +01:00
Michal Nowak	b36690e750	Initialize checknames field in dns_view_create() The 'checknames' field wasn't initialized in dns_view_create(), but it should otherwise AddressSanitizer identifies the following runtime error in query_test.c. runtime error: load of value 190, which is not a valid value for type '_Bool' (cherry picked from commit `0c6fa16477`)	2021-02-23 16:45:36 +01:00
Matthijs Mekking	8e4d0d51be	Purge keys implementation On each keymgr run, we now also check if key files can be removed. The 'purge-keys' interval determines how long keys should be retained after they have become completely hidden. Key files should not be removed if it has a state that is set to something else then HIDDEN, if purge-keys is 0 (disabled), if the key goal is set to OMNIPRESENT, or if the key is unused (a key is unused if no timing metadata set, and no states are set or if set, they are set to HIDDEN). If the last changed timing metadata plus the purge-keys interval is in the past, the key files may be removed. Add a dst_key_t variable 'purge' to signal that the key file should not be written to file again. (cherry picked from commit `8c526cb67f`)	2021-02-23 09:19:03 +01:00
Matthijs Mekking	45dcabf411	Add purge-keys config option Add a new option 'purge-keys' to 'dnssec-policy' that will purge key files for deleted keys. The option determines how long key files should be retained prior to removing the corresponding files from disk. If set to 0, the option is disabled and 'named' will not remove key files from disk. (cherry picked from commit `313de3a7e2`)	2021-02-23 09:18:55 +01:00
Mark Andrews	4e192d2fe7	Address theoretical resource leak in dns_dt_open() dns_dt_open() is not currently called with mode dns_dtmode_unix. *** CID 281489: Resource leaks (RESOURCE_LEAK) /lib/dns/dnstap.c: 983 in dns_dt_open() 977 978 if (!dnstap_file(handle->reader)) { 979 CHECK(DNS_R_BADDNSTAP); 980 } 981 break; 982 case dns_dtmode_unix: CID 281489: Resource leaks (RESOURCE_LEAK) Variable "handle" going out of scope leaks the storage it points to. 983 return (ISC_R_NOTIMPLEMENTED); 984 default: 985 INSIST(0); 986 ISC_UNREACHABLE(); 987 } 988 (cherry picked from commit `003dd8cc70`)	2021-02-23 09:41:15 +11:00
Mark Andrews	d68b85e555	Correctly detect when get_direction failed (cherry picked from commit `009358d77d`)	2021-02-19 11:39:12 +11:00
Mark Andrews	0cea486327	Test a LOC record with an invalid direction field (cherry picked from commit `07902d9f9d`)	2021-02-19 11:39:12 +11:00
Ondřej Surý	b04cb88462	Fix off-by-one bug in ISC SPNEGO implementation The ISC SPNEGO implementation is based on mod_auth_kerb code. When CVE-2006-5989 was disclosed, the relevant fix was not applied to the BIND 9 codebase, making the latter vulnerable to the aforementioned flaw when "tkey-gssapi-keytab" or "tkey-gssapi-credential" is set in named.conf. The original description of CVE-2006-5989 was: Off-by-one error in the der_get_oid function in mod_auth_kerb 5.0 allows remote attackers to cause a denial of service (crash) via a crafted Kerberos message that triggers a heap-based buffer overflow in the component array. Later research revealed that this flaw also theoretically enables remote code execution, though achieving the latter in real-world conditions is currently deemed very difficult. This vulnerability was responsibly reported as ZDI-CAN-12302 ("ISC BIND TKEY Query Heap-based Buffer Overflow Remote Code Execution Vulnerability") by Trend Micro Zero Day Initiative.	2021-02-17 22:36:08 +01:00
Ondřej Surý	d7b3a6a016	Rollback setting IP_DONTFRAG option on the UDP sockets In DNS Flag Day 2020, the development branch started setting the IP_DONTFRAG option on the UDP sockets. It turned out, that this code was incomplete leading to dropping the outgoing UDP packets. Henceforth this commit rolls back this setting until we have a proper fix that would send back empty response with TC flag set. (cherry picked from commit `66eefac78c`)	2021-02-17 14:41:56 +01:00
Michal Nowak	f483b102dd	Drop USE_OPENSSL constraint from dh_test The USE_OPENSSL constraint in dh_test does not seems to be necessary anymore, the test runs with PKCS#11 as well. (cherry picked from commit `c341e7f740`)	2021-02-17 12:46:25 +01:00
Michal Nowak	ed38e32b69	Ensure dnstap_test returns SKIPPED_TEST_EXIT_CODE Make sure lib/dns/tests/dnstap_test returns an exit code that indicates a skipped test when dnstap is not enabled. (cherry picked from commit `c286341703`)	2021-02-17 12:15:18 +01:00
Michal Nowak	04aff208fb	Use BIND 9.17 preprocessor macro to skip unit test BIND 9.17 changed exit code of skipped test to meet Automake expectations in `fa505bfb0e`. BIND 9.16 was not rewritten to Automake, but for consistency reasons, the same SKIPPED_TEST_EXIT_CODE preprocessor macro is used (though the actual exit code differs from the one in BIND 9.17). (cherry picked from commit `fa505bfb0e`)	2021-02-17 12:09:25 +01:00
Mark Andrews	d51b78c85b	Stop including <gssapi.h> from <dst/gssapi.h> header The only reason for including the gssapi.h from the dst/gssapi.h header was to get the typedefs of gss_cred_id_t and gss_ctx_id_t. Instead of using those types directly this commit introduces dns_gss_cred_id_t and dns_gss_ctx_id_t types that are being used in the public API and privately retyped to their counterparts when we actually call the gss api. This also conceals the gssapi headers, so users of the libdns library doesn't have to add GSSAPI_CFLAGS to the Makefile when including libdns dst API.	2021-02-16 12:08:21 +11:00
Ondřej Surý	4bbe3e75de	Stop including dnstap headers from <dns/dnstap.h> The <fstrm.h> and <protobuf-c/protobuf-c.h> headers are only directly included where used and we stopped exposing those headers from libdns headers.	2021-02-16 12:08:21 +11:00
Mark Andrews	bf5aac225b	Stop including <lmdb.h> from <dns/lmdb.h> The lmdb.h header doesn't have to be included from the dns/lmdb.h header as it can be separately included where used. This stops exposing the inclusion of lmdb.h from the libdns headers.	2021-02-16 12:08:21 +11:00
Mark Andrews	b8fc8742e5	Re-order include directories ${FSTRM_CFLAGS} ${PROTOBUF_C_CFLAGS} ${OPENSSL_CFLAGS} ${LMDB_CFLAGS} need to appear after all directories in the build tree.	2021-02-16 12:08:21 +11:00
Diego Fronza	d89a8bf696	Fix dangling references to outdated views after reconfig This commit fix a leak which was happening every time an inline-signed zone was added to the configuration, followed by a rndc reconfig. During the reconfig process, the secure version of every inline-signed zone was "moved" to a new view upon a reconfig and it "took the raw version along", but only once the secure version was freed (at shutdown) was prev_view for the raw version detached from, causing the old view to be released as well. This caused dangling references to be kept for the previous view, thus keeping all resources used by that view in memory.	2021-02-15 11:52:50 -03:00
Mark Andrews	6e30caed57	Silence Insecure data handling (TAINTED_SCALAR) Coverity assumes that the memory holding any value read using byte swapping is tainted. As we store the NSEC3PARAM records in wire form and iterations is byte swapped the memory holding the record is marked as tainted. nsec3->salt_length is marked as tainted transitively. To remove the taint the value need to be range checked. For a correctly formatted record region.length should match nsec3->salt_length and provides a convenient value to check the field against. *** CID 316507: Insecure data handling (TAINTED_SCALAR) /lib/dns/rdata/generic/nsec3param_51.c: 241 in tostruct_nsec3param() 235 region.length = rdata->length; 236 nsec3param->hash = uint8_consume_fromregion(&region); 237 nsec3param->flags = uint8_consume_fromregion(&region); 238 nsec3param->iterations = uint16_consume_fromregion(&region); 239 240 nsec3param->salt_length = uint8_consume_fromregion(&region); >>> CID 316507: Insecure data handling (TAINTED_SCALAR) >>> Passing tainted expression "nsec3param->salt_length" to "mem_maybedup", which uses it as an offset. 241 nsec3param->salt = mem_maybedup(mctx, region.base, 242 nsec3param->salt_length); 243 if (nsec3param->salt == NULL) { 244 return (ISC_R_NOMEMORY); 245 } 246 isc_region_consume(&region, nsec3param->salt_length); (cherry picked from commit `c40133d840`)	2021-02-12 10:43:19 +11:00
Mark Andrews	8302e9fb69	Silence Untrusted value as argument (TAINTED_SCALAR) Coverity assumes that the memory holding any value read using byte swapping is tainted. As we store the NSEC3 records in wire form and iterations is byte swapped the memory holding the record is marked as tainted. nsec3->salt_length and nsec3->next_length are marked as tainted transitively. To remove the taint the values need to be range checked. Valid values for these should never exceed region.length so that is becomes a reasonable value to check against. *** CID 316509: (TAINTED_SCALAR) /lib/dns/rdata/generic/nsec3_50.c: 312 in tostruct_nsec3() 306 if (nsec3->salt == NULL) { 307 return (ISC_R_NOMEMORY); 308 } 309 isc_region_consume(&region, nsec3->salt_length); 310 311 nsec3->next_length = uint8_consume_fromregion(&region); >>> CID 316509: (TAINTED_SCALAR) >>> Passing tainted expression "nsec3->next_length" to "mem_maybedup", which uses it as an offset. 312 nsec3->next = mem_maybedup(mctx, region.base, nsec3->next_length); 313 if (nsec3->next == NULL) { 314 goto cleanup; 315 } 316 isc_region_consume(&region, nsec3->next_length); 317 /lib/dns/rdata/generic/nsec3_50.c: 305 in tostruct_nsec3() 299 region.length = rdata->length; 300 nsec3->hash = uint8_consume_fromregion(&region); 301 nsec3->flags = uint8_consume_fromregion(&region); 302 nsec3->iterations = uint16_consume_fromregion(&region); 303 304 nsec3->salt_length = uint8_consume_fromregion(&region); >>> CID 316509: (TAINTED_SCALAR) >>> Passing tainted expression "nsec3->salt_length" to "mem_maybedup", which uses it as an offset. 305 nsec3->salt = mem_maybedup(mctx, region.base, nsec3->salt_length); 306 if (nsec3->salt == NULL) { 307 return (ISC_R_NOMEMORY); 308 } 309 isc_region_consume(&region, nsec3->salt_length); 310 (cherry picked from commit `fd8d1337a5`)	2021-02-12 10:43:19 +11:00
Michal Nowak	001413ed50	Drop AddressSanitizer constraint from libns unit tests The AddressSanitizer constraint in some libns unit tests does not seem to be necessary anymore, these tests run fine under AddressSanitizer. (cherry picked from commit `613be8706e`)	2021-02-10 11:03:27 +01:00
Matthijs Mekking	2afaff75ed	Use stale on error also when unable to recurse The 'query_usestale()' function was only called when in 'query_gotanswer()' and an unexpected error occurred. This may have been "quota reached", and thus we were in some cases returning stale data on fetch-limits (and if serve-stale enabled of course). But we can also hit fetch-limits when recursing because we are following a referral (in 'query_notfound()' and 'query_delegation_recurse()'). Here we should also check for using stale data in case an error occurred. Specifically don't check for using stale data when refetching a zero TTL RRset from cache. Move the setting of DNS_DBFIND_STALESTART into the 'query_usestale()' function to avoid code duplication. (cherry picked from commit `8bcd7fe69e`)	2021-02-08 16:10:03 +01:00
Matthijs Mekking	dbf5428629	Only start stale refresh window when resuming If we did not attempt a fetch due to fetch-limits, we should not start the stale-refresh-time window. Introduce a new flag DNS_DBFIND_STALESTART to differentiate between a resolver failure and unexpected error. If we are resuming, this indicates a resolver failure, then start the stale-refresh-time window, otherwise don't start the stale-refresh-time window, but still fall back to using stale data. (This commit also wraps some docstrings to 80 characters width) (cherry picked from commit `aabdedeae3`)	2021-02-08 16:07:43 +01:00
Matthijs Mekking	809ec0a224	Use stale data also if we are not resuming Before this change, BIND will only fallback to using stale data if there was an actual attempt to resolve the query. Then on a timeout, the stale data from cache becomes eligible. This commit changes this so that on any unexpected error stale data becomes eligble (you would still have to have 'stale-answer-enable' enabled of course). If there is no stale data, this may return in an error again, so don't loop on stale data lookup attempts. If the DNS_DBFIND_STALEOK flag is set, this means we already tried to lookup stale data, so if that is the case, don't use stale again. (cherry picked from commit `c6fd02aed5`)	2021-02-08 16:07:43 +01:00
Mark Andrews	8092b7eec6	Remove redundant 'version == NULL' check *** CID 318094: Null pointer dereferences (REVERSE_INULL) /lib/dns/rbtdb.c: 1389 in newversion() 1383 version->xfrsize = rbtdb->current_version->xfrsize; 1384 RWUNLOCK(&rbtdb->current_version->rwlock, isc_rwlocktype_read); 1385 rbtdb->next_serial++; 1386 rbtdb->future_version = version; 1387 RBTDB_UNLOCK(&rbtdb->lock, isc_rwlocktype_write); 1388 CID 318094: Null pointer dereferences (REVERSE_INULL) Null-checking "version" suggests that it may be null, but it has already been dereferenced on all paths leading to the check. 1389 if (version == NULL) { 1390 return (result); 1391 } 1392 1393 *versionp = version; 1394 (cherry picked from commit `456d53d1fb`)	2021-02-08 16:17:52 +11:00
Mark Andrews	a900d79ea8	Cleanup redundant isc_rwlock_init() result checks (cherry picked from commit `3b11bacbb7`)	2021-02-08 15:13:49 +11:00
Mark Andrews	c2a5b88275	Attempt to silence untrusted loop bound Assign hit_len + key_len to len and test the result rather than recomputing and letting the compiler simplify. 213 isc_region_consume(&region, 2); /* hit length + algorithm / 9. tainted_return_value: Function uint16_fromregion returns tainted data. [show details] 10. tainted_data_transitive: Call to function uint16_fromregion with tainted argument region.base returns tainted data. 11. tainted_return_value: Function uint16_fromregion returns tainted data. 12. tainted_data_transitive: Call to function uint16_fromregion with tainted argument *region.base returns tainted data. 13. var_assign: Assigning: key_len = uint16_fromregion(&region), which taints key_len. 214 key_len = uint16_fromregion(&region); 14. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound. 15. Condition key_len == 0, taking false branch. 215 if (key_len == 0) { 216 RETERR(DNS_R_FORMERR); 217 } 16. Condition !!(_r->length >= _l), taking true branch. 17. Condition !!(_r->length >= _l), taking true branch. 218 isc_region_consume(&region, 2); 18. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound. 19. Condition region.length < (unsigned int)(hit_len + key_len), taking false branch. 219 if (region.length < (unsigned)(hit_len + key_len)) { 220 RETERR(DNS_R_FORMERR); 221 } 222 20. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound. 21. Condition _r != 0, taking false branch. 223 RETERR(mem_tobuffer(target, rr.base, 4 + hit_len + key_len)); 22. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound. 23. var_assign_var: Compound assignment involving tainted variable 4 + hit_len + key_len to variable source->current taints source->current. 224 isc_buffer_forward(source, 4 + hit_len + key_len); 225 226 dns_decompress_setmethods(dctx, DNS_COMPRESS_NONE); CID 281461 (#1 of 1): Untrusted loop bound (TAINTED_SCALAR) 24. tainted_data: Using tainted variable source->active - source->current as a loop boundary. Ensure that tainted values are properly sanitized, by checking that their values are within a permissible range. 227 while (isc_buffer_activelength(source) > 0) { 228 dns_name_init(&name, NULL); 229 RETERR(dns_name_fromwire(&name, source, dctx, options, target)); 230 } (cherry picked from commit `2f946c831a`)	2021-02-08 14:05:11 +11:00
Mark Andrews	6da9f238d4	Allow A records below '_spf' labels as recommend by RFC7208 (cherry picked from commit `63c16c8506`)	2021-02-03 16:26:25 +01:00
Matthijs Mekking	ce2a37a990	Use NUM_KEYSTATES constant where appropriate We use the number 4 a lot when working on key states. Better to use the NUM_KEYSTATES constant instead. (cherry picked from commit `98ace6d97d`)	2021-02-03 15:48:20 +01:00
Matthijs Mekking	a8fba11da9	Cleanup keymgr.c Three small cleanups: 1. Remove an unused keystr/dst_key_format. 2. Initialize a dst_key_state_t state with NA. 3. Update false comment about local policy (local policy only adds barrier on transitions to the RUMOURED state, not the UNRETENTIVE state). (cherry picked from commit `189d9a2d21`)	2021-02-03 15:47:40 +01:00
Matthijs Mekking	ceac392e19	Fix DS/DNSKEY hidden or chained functions There was a bug in function 'keymgr_ds_hidden_or_chained()'. The funcion 'keymgr_ds_hidden_or_chained()' implements (3e) of rule2 as defined in the "Flexible and Robust Key Rollover" paper. The rules says: All DS records need to be in the HIDDEN state, or if it is not there must be a key with its DNSKEY and KRRSIG in OMNIPRESENT, and its DS in the same state as the key in question. In human langauge, if all keys have their DS in HIDDEN state you can do what you want, but if a DS record is available to some validators, there must be a chain of trust for it. Note that the barriers on transitions first check if the current state is valid, and then if the next state is valid too. But here we falsely updated the 'dnskey_omnipresent' (now 'dnskey_chained') with the next state. The next state applies to 'key' not to the state to be checked. Updating the state here leads to (true) always, because the key that will move its state will match the falsely updated expected state. This could lead to the assumption that Key 2 would be a valid chain of trust for Key 1, while clearly the presence of any DS is uncertain. The fix here is to check if the DNSKEY and KRRSIG are in OMNIPRESENT state for the key that does not have its DS in the HIDDEN state, and only if that is not the case, ensure that there is a key with the same algorithm, that provides a valid chain of trust, that is, has its DNSKEY, KRRSIG, and DS in OMNIPRESENT state. The changes in 'keymgr_dnskey_hidden_or_chained()' are only cosmetical, renaming 'rrsig_omnipresent' to 'rrsig_chained' and removing the redundant initialization of the DST_KEY_DNSKEY expected state to NA. (cherry picked from commit `291bcc3721`)	2021-02-03 15:47:30 +01:00
Matthijs Mekking	6ff0e99fa7	Update keymgr_key_is_successor() calls The previous commit changed the function definition of 'keymgr_key_is_successor()', this commit updates the code where this function is called. In 'keymgr_key_exists_with_state()' the logic is also updated slightly to become more readable. First handle the easy cases: - If the key does not match the state, continue with the next key. - If we found a key with matching state, and there is no need to check the successor relationship, return (true). - Otherwise check the successor relationship. In 'keymgr_key_has_successor()' it is enough to check if a key has a direct successor, so instead of calling 'keymgr_key_is_successor()', we can just check 'keymgr_direct_dep()'. In 'dns_keymgr_run()', we want to make sure that there is no dependency on the keys before retiring excess keys, so replace 'keymgr_key_is_successor()' with 'keymgr_dep()'. (cherry picked from commit `600915d1b2`)	2021-02-03 15:47:23 +01:00
Matthijs Mekking	5e40515671	Implement Equation(2) of "Flexible Key Rollover" So far the key manager could only deal with two keys in a rollover, because it used a simplified version of the successor relationship equation from "Flexible and Robust Key Rollover" paper. The simplified version assumes only two keys take part in the key rollover and it for that it is enough to check the direct relationship between two keys (is key x the direct predecessor of key z and is key z the direct successor of key x?). But when a third key (or more keys) comes into the equation, the key manager would assume that one key (or more) is redundant and removed it from the zone prematurely. Fix by implementing Equation(2) correctly, where we check for dependencies on keys: z ->T x: Dep(x, T) = ∅ ∧ (x ∈ Dep(z, T) ∨ ∃ y ∈ Dep(z, T)(y != z ∧ y ->T x ∧ DyKyRySy = DzKzRzSz)) This says: key z is a successor of key x if: - key x depends on key z if z is a direct successor of x, - or if there is another key y that depends on key z that has identical key states as key z and key y is a successor of key x. - Also, key x may not have any other keys depending on it. This is still a simplified version of Equation(2) (but at least much better), because the paper allows for a set of keys to depend on a key. This is defined as the set Dep(x, T). Keys in the set Dep(x, T) have a dependency on key x for record type T. The BIND implementation can only have one key in the set Dep(x, T). The function 'keymgr_dep()' stores this key in 'uint32_t *dep' if there is a dependency. There are two scenarios where multiple keys can depend on a single key: 1. Rolling keys is faster than the time required to finish the rollover procedure. This scenario is covered by the recursive implementation, and checking for a chain of direct dependencies will suffice. 2. Changing the policy, when a zone is requested to be signed with a different key length for example. BIND 9 will not mark successor relationships in this case, but tries to move towards the new policy. Since there is no successor relationship, the rules are even more strict, and the DNSSEC reconfiguration is actually slower than required. Note: this commit breaks the build, because the function definition of 'keymgr_key_is_successor' changed. This will be fixed in the following commit. (cherry picked from commit `cc38527b63`)	2021-02-03 15:47:14 +01:00
Matthijs Mekking	3f6dafe1f4	Remove initialize goal code Since keys now have their goals initialized in 'keymgr_key_init()', remove this redundant piece of code in 'keymgr_key_run()'. (cherry picked from commit `82632fa6d9`)	2021-02-03 08:42:51 +01:00
Matthijs Mekking	4170288a91	Correctly initialize old key with state file The 'key_init()' function is used to initialize a state file for keys that don't have one yet. This can happen if you are migrating from a 'auto-dnssec' or 'inline-signing' to a 'dnssec-policy' configuration. It did not look at the "Inactive" and "Delete" timing metadata and so old keys left behind in the key directory would also be considered as a possible active key. This commit fixes this and now explicitly sets the key goal to OMNIPRESENT for keys that have their "Active/Publish" timing metadata in the past, but their "Inactive/Delete" timing metadata in the future. If the "Inactive/Delete" timing metadata is also in the past, the key goal is set to HIDDEN. If the "Inactive/Delete" timing metadata is in the past, also the key states are adjusted to either UNRETENTIVE or HIDDEN, depending on how far in the past the metadata is set. (cherry picked from commit `76cf72e65a`)	2021-02-03 08:42:32 +01:00
Diego Fronza	51663408bc	Fix race condition on check_stale_header This commit fix a race that could happen when two or more threads have failed to refresh the same RRset, the threads could simultaneously attempt to update the header->last_refresh_fail_ts field in check_stale_header, a field used to implement stale-refresh-time. By making this field atomic we avoid such race. (cherry picked from commit `c75575e350`)	2021-01-29 15:29:00 +01:00
Matthijs Mekking	99c72bf5da	Update code flow in query.c wrt stale data First of all, there was a flaw in the code related to the 'stale-refresh-time' option. If stale answers are enabled, and we returned stale data, then it was assumed that it was because we were in the 'stale-refresh-time' window. But now we could also have returned stale data because of a 'stale-answer-client-timeout'. To fix this, introduce a rdataset attribute DNS_RDATASETATTR_STALE_WINDOW to indicate whether the stale cache entry was returned because the 'stale-refresh-time' window is active. Second, remove the special case handling when the result is DNS_R_NCACHENXRRSET. This can be done more generic in the code block when dealing with stale data. Putting all stale case handling in the code block when dealing with stale data makes the code more easy to follow. Update documentation to be more verbose and to match then new code flow. (cherry picked from commit `fa0c9280d2`)	2021-01-29 10:43:41 +01:00
Diego Fronza	0e62c53c5b	Extracted common function from query_lookup and query_refresh_rrset Both functions employed the same code lines to allocate query context buffers, which are used to store query results, so this shared portion of code was extracted out to a new function, qctx_prepare_buffers. Also, this commit uses qctx_init to initialize the query context whitin query_refresh_rrset function. (cherry picked from commit `966060c03b`)	2021-01-29 10:43:27 +01:00
Diego Fronza	5cbb28a40e	Small optimization in query_usestale This commit makes the code in query_usestale easier to follow, it also doesn't attach/detach to the database if stale answers are not enabled. (cherry picked from commit `f89ac07b28`)	2021-01-29 10:41:39 +01:00
Diego Fronza	8324c3ddfe	Allow stale data to be used before name resolution This commit allows stale RRset to be used (if available) for responding a query, before an attempt to refresh an expired, or otherwise resolve an unavailable RRset in cache is made. For that to work, a value of zero must be specified for stale-answer-client-timeout statement. To better understand the logic implemented, there are three flags being used during database lookup and other parts of code that must be understood: . DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a RRset due to timeout (resolver-query-timeout), its intent is to try to look for stale data in cache as a fallback, but only if stale answers are enabled in configuration. This flag is also used to activate stale-refresh-time window, since it is the only way the database knows that a resolution has failed. . DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database that it may use stale data. It is always set during query lookup if stale answers are enabled, but only effectively used during stale-refresh-time window. Also during this window, the resolver will not try to resolve the query, in other words no attempt to refresh the data in cache is made when the stale-refresh-time window is active. . DNS_DBFIND_STALEONLY: This new introduced flag is used when we want stale data from the database, but not due to a failure in resolution, it also doesn't require stale-refresh-time window timer to be active. As long as there is a stale RRset available, it should be returned. It is mainly used in two situations: 1. When stale-answer-client-timeout timer is triggered: in that case we want to know if there is stale data available to answer the client. 2. When stale-answer-client-timeout value is set to zero: in that case, we also want to know if there is some stale RRset available to promptly answer the client. We must also discern between three situations that may happen when resolving a query after the addition of stale-answer-client-timeout statement, and how to handle them: 1. Are we running query_lookup() due to stale-answer-client-timeout timer being triggered? In this case, we look for stale data, making use of DNS_DBFIND_STALEONLY flag. If a stale RRset is available then respond the client with the data found, mark this query as answered (query attribute NS_QUERYATTR_ANSWERED), so when the fetch completes the client won't be answered twice. We must also take care of not detaching from the client, as a fetch will still be running in background, this is handled by the following snippet: if (!QUERY_STALEONLY(&client->query)) { isc_nmhandle_detach(&client->reqhandle); } Which basically tests if DNS_DBFIND_STALEONLY flag is set, which means we are here due to a stale-answer-client-timeout timer expiration. 2. Are we running query_lookup() due to resolver-query-timeout being triggered? In this case, DNS_DBFIND_STALEOK flag will be set and an attempt to look for stale data will be made. As already explained, this flag is algo used to activate stale-refresh-time window, as it means that we failed to refresh a RRset due to timeout. It is ok in this situation to detach from the client, as the fetch is already completed. 3. Are we running query_lookup() during the first time, looking for a RRset in cache and stale-answer-client-timeout value is set to zero? In this case, if stale answers are enabled (probably), we must do an initial database lookup with DNS_DBFIND_STALEONLY flag set, to indicate to the database that we want stale data. If we find an active RRset, proceed as normal, answer the client and the query is done. If we find a stale RRset we respond to the client and mark the query as answered, but don't detach from the client yet as an attempt in refreshing the RRset will still be made by means of the new introduced function 'query_resolve'. If no active or stale RRset is available, begin resolution as usual. (cherry picked from commit `e219422575`)	2021-01-29 10:39:09 +01:00
Diego Fronza	0aebad96b5	Added option for disabling stale-answer-client-timeout This commit allows to specify "disabled" or "off" in stale-answer-client-timeout statement. The logic to support this behavior will be added in the subsequent commits. This commit also ensures an upper bound to stale-answer-client-timeout which equals to one second less than 'resolver-query-timeout'. (cherry picked from commit `0ad6f594f6`)	2021-01-29 10:38:58 +01:00
Diego Fronza	3478794a5d	Add stale-answer-client-timeout option The general logic behind the addition of this new feature works as folows: When a client query arrives, the basic path (query.c / ns_query_recurse) was to create a fetch, waiting for completion in fetch_callback. With the introduction of stale-answer-client-timeout, a new event of type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale answers are enabled and the fetch took longer than stale-answer-client-timeout to complete. When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we must ensure that the folowing happens: 1. Setup a new query context with the sole purpose of looking up for stale RRset only data, for that matters a new flag was added 'DNS_DBFIND_STALEONLY' used in database lookups. . If a stale RRset is found, mark the original client query as answered (with a new query attribute named NS_QUERYATTR_ANSWERED), so when the fetch completion event is received later, we avoid answering the client twice. . If a stale RRset is not found, cleanup and wait for the normal fetch completion event. 2. In ns_query_done, we must change this part: /* * If we're recursing then just return; the query will * resume when recursion ends. */ if (RECURSING(qctx->client)) { return (qctx->result); } To this: if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) { return (qctx->result); } Otherwise we would not proceed to answer the client if it happened that a stale answer was found when looking up for stale only data. When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we proceed as before, resuming query, updating stats, etc, but a few exceptions had to be added, most important of which are two: 1. Before answering the client (ns_client_send), check if the query wasn't already answered before. 2. Before detaching a client, e.g. isc_nmhandle_detach(&client->reqhandle), ensure that this is the fetch completion event, and not the one triggered due to stale-answer-client-timeout, so a correct call would be: if (!QUERY_STALEONLY(client)) { isc_nmhandle_detach(&client->reqhandle); } Other than these notes, comments were added in code in attempt to make these updates easier to follow. (cherry picked from commit `171a5b7542`)	2021-01-29 10:38:32 +01:00
Diego Fronza	7bf8950a0a	Added dns_view_staleanswerenabled() function Since it takes a couple lines of code to check whether stale answers are enabled for a given view, code was extracted out to a proper function. (cherry picked from commit `74840ec50b`)	2021-01-29 10:35:26 +01:00
Diego Fronza	f3bd27373d	Avoid iterating name twice when constructing fctx->info This is a minor performance improvement, we store the result of the first call to strlcat to use as an offset in the next call when constructing fctx->info string. (cherry picked from commit `49c40827f6`)	2021-01-29 10:35:17 +01:00
Mark Andrews	6a0b751555	Require 'ctx' to be non-NULL in cfg_acl_fromconfig{,2} (cherry picked from commit `a8b55992a8`)	2021-01-28 13:43:47 +11:00
Mark Andrews	afc75de0cc	Optimise dnssec-verify dns_dnssec_keyfromrdata() only needs to be called once per DNSKEY rather than once per verification attempt. (cherry picked from commit `c75b325832`)	2021-01-28 12:18:31 +11:00
Mark Andrews	b416d8fcdf	Improve the diagnostic 'rndc retransfer' error message (cherry picked from commit `dd3520ae41`)	2021-01-28 09:44:26 +11:00
Matthijs Mekking	4a36b6d918	Make opensslecdsa_parse use fromlabel When 'opensslecdsa_parse()' encounters a label tag in the private key file, load the private key with 'opensslecdsa_fromlabel()'. Otherwise load it from the private structure. This was attempted before with 'load_privkey()' and 'uses_engine()', but had the same flaw as 'opensslecdsa_fromlabel()' had previously, that is getting the private and public key separately, juggling with pointers between EC_KEY and EVP_PKEY, did not create a valid cryptographic key that could be used for signing. (cherry picked from commit `57ac70ad46`)	2021-01-26 15:04:59 +01:00
Matthijs Mekking	97185ecac2	Simplify opensslecdsa_fromlabel The 'opensslecdsa_fromlabel()' function does not need to get the OpenSSL engine twice to load the private and public key. Also no need to call 'dst_key_to_eckey()' as the EC_KEY can be derived from the loaded EVP_PKEY's. Add some extra checks to ensure the key has the same base id and curve (group nid) as the dst key. Since we already have the EVP_PKEY, no need to call 'finalize_eckey()', instead just set the right values in the key structure. (cherry picked from commit `393052d6ff`)	2021-01-26 15:04:51 +01:00
Matthijs Mekking	f555cec0af	Replace EVP_DigestFinal with EVP_DigestFinal_ex The openssl docs claim that EVP_DigestFinal() is obsolete and that one should use EVP_DigestFinal_ex() instead. (cherry picked from commit `1fcd0ef8bd`)	2021-01-26 15:04:38 +01:00
Matthijs Mekking	9e2ea5efb1	Don't set pubkey if eckey already has public key The 'ecdsa_check()' function tries to correctly set the public key on the eckey, but this should be skipped if the public key is retrieved via the private key. (cherry picked from commit `06b9724152`)	2021-01-26 15:04:21 +01:00
Matthijs Mekking	e3acfb44d5	ECDSA code should not use RSA label The 'opensslecdsa_tofile()' function tags the label as an RSA label, that is a copy paste error and should be of course an ECDSA label. (cherry picked from commit `46afeca8bf`)	2021-01-26 15:04:11 +01:00
Matthijs Mekking	8b25d3ab57	Correctly update pointers to pubkey and privkey The functions 'load_pubkey_from_engine()' and 'load_privkey_from_engine()' did not correctly store the pointers. Update both functions to add 'EC_KEY_set_public_key()' and 'EC_KEY_set_private_key()' respectively, so that the pointers to the public and private keys survive the "load from engine" functions. (cherry picked from commit `01239691a1`)	2021-01-26 15:04:03 +01:00
Matthijs Mekking	f66df9f1b7	load_pubkey_from_engine() should load public key The 'function load_pubkey_from_engine()' made a call to the libssl function 'ENGINE_load_private_key'. This is a copy paste error and should be 'ENGINE_load_public_key'. (cherry picked from commit `370285a62d`)	2021-01-26 15:03:43 +01:00
Evan Hunt	077e2c2a74	add serial number to "transfer ended" log messages	2021-01-26 12:38:32 +01:00
Evan Hunt	2df6ffc051	check size ratio when responding to IXFR requests	2021-01-26 12:38:32 +01:00
Evan Hunt	9950247c78	improve calculation of database transfer size - change name of 'bytes' to 'xfrsize' in dns_db_getsize() parameter list and related variables; this is a more accurate representation of what the function is doing - change the size calculations in dns_db_getsize() to more accurately represent the space needed for a XFR message or journal file to contain the data in the database. previously we returned the sizes of all rdataslabs, including header overhead and offset tables, which resulted in the database size being reported as much larger than the equivalent XFR or journal. - map files caused a particular problem here: the fullname can't be determined from the node while a file is being deserialized, because the uppernode pointers aren't set yet. so we store "full name length" in the dns_rbtnode structure while serializing, and clear it after deserialization is complete.	2021-01-26 12:38:32 +01:00
Evan Hunt	70df95e9f5	dns_journal_iter_init() can now return the size of the delta the call initailizing a journal iterator can now optionally return to the caller the size in bytes of an IXFR message (not including DNS header overhead, signatures etc) containing the differences from the beginning to the ending serial number. this is calculated by scanning the journal transaction headers to calculate the transfer size. since journal file records contain a length field that is not included in IXFR messages, we subtract out the length of those fields from the overall transaction length. this necessitated adding an "RR count" field to the journal transaction header, so we know how many length fields to subract. NOTE: this will make existing journal files stop working!	2021-01-26 12:38:32 +01:00
Evan Hunt	57aadd6cea	add syntax and setter/getter functions to configure max-ixfr-ratio	2021-01-26 12:38:32 +01:00
Ondřej Surý	0e25af628c	Use -release instead of -version-info for internal library SONAMEs The BIND 9 libraries are considered to be internal only and hence the API and ABI changes a lot. Keeping track of the API/ABI changes takes time and it's a complicated matter as the safest way to make everything stable would be to bump any library in the dependency chain as in theory if libns links with libdns, and a binary links with both, and we bump the libdns SOVERSION, but not the libns SOVERSION, the old libns might be loaded by binary pulling old libdns together with new libdns loaded by the binary. The situation gets even more complicated with loading the plugins that have been compiled with few versions old BIND 9 libraries and then dynamically loaded into the named. We are picking the safest option possible and usable for internal libraries - instead of using -version-info that has only a weak link to BIND 9 version number, we are using -release libtool option that will embed the corresponding BIND 9 version number into the library name. That means that instead of libisc.so.1608 (as an example) the library will now be named libisc-9.16.10.so. (cherry picked from commit `c605d75ea5`)	2021-01-25 15:28:09 +01:00
Tinderbox User	536bc1163a	prep 9.16.11	2021-01-21 09:11:54 +01:00
Evan Hunt	1a32a4d001	prevent "primaries" lists from having duplicate names it is now an error to have two primaries lists with the same name. this is true regardless of whether the "primaries" or "masters" keywords were used to define them. (cherry picked from commit `f619708bbf`)	2021-01-12 15:21:14 +01:00
Evan Hunt	746aa2581c	add "primary-only" as a synonym for "master-only" update the "notify" option to use RFC 8499 terminology as well. (cherry picked from commit `424a3cf3cc`)	2021-01-12 15:21:14 +01:00
Evan Hunt	04b9cdb53c	add "primaries" as a synonym for "masters" in named.conf as "type primary" is preferred over "type master" now, it makes sense to make "primaries" available as a synonym too. added a correctness check to ensure "primaries" and "masters" cannot both be used in the same zone. (cherry picked from commit `16e14353b1`)	2021-01-12 15:21:14 +01:00
Matthijs Mekking	c4520620dc	Fix signatures-validity config option KASP was using 'signatures-validity-dnskey' instead of 'signatures-validity'. (cherry picked from commit `ad63e9e4f8`)	2021-01-12 13:13:05 +01:00
Mark Andrews	07e899f616	Inactive incorrectly incremented It is possible to have two threads destroying an rbtdb at the same time when detachnode() executes and removes the last reference to a node between exiting being set to true for the node and testing if the references are zero in maybe_free_rbtdb(). Move NODE_UNLOCK() to after checking if references is zero to prevent detachnode() changing the reference count too early. (cherry picked from commit `859d2fdad6`)	2021-01-06 16:33:32 +11:00
Matthijs Mekking	63e58f09a5	Fix dnssec-signzone and -verify logging (again) While fixing #2359, 'report()' was changed so that it would print the newline. Newlines were missing from the output of 'dnssec-signzone' and 'dnssec-verify' because change `664b8f04f5` moved the printing from newlines to the library. This had to be reverted because this also would print redundant newlines in logfiles. While doing the revert, some newlines in 'lib/dns/zoneverify.c' were left in place, now making 'dnssec-signzone' and 'dnssec-verify' print too many newlines. This commit removes those newlines, so that the output looks nice again. (cherry picked from commit `18c62a077e`)	2021-01-05 13:41:49 +01:00
Matthijs Mekking	d564ad5f52	Update keymgr to allow transition to insecure mode The keymgr prevented zones from going to insecure mode. If we have a policy with an empty key list this is a signal that the zone wants to go back to insecure mode. In this case allow one extra state transition to be valid when checking for DNSSEC safety. (cherry picked from commit `9134100069`)	2020-12-23 11:56:54 +01:00
Matthijs Mekking	6da379d844	Publish CDS/CDNSKEY Delete Records Check if zone is transitioning from secure to insecure. If so, delete the CDS/CDNSKEY records, otherwise make sure they are not part of the RRset. (cherry picked from commit `68d715a229`)	2020-12-23 11:56:44 +01:00
Matthijs Mekking	cf0439cd5f	Treat dnssec-policy "none" as a builtin zone Configure "none" as a builtin policy. Change the 'cfg_kasp_fromconfig' api so that the 'name' will determine what policy needs to be configured. When transitioning a zone from secure to insecure, there will be cases when a zone with no DNSSEC policy (dnssec-policy none) should be using KASP. When there are key state files available, this is an indication that the zone once was DNSSEC signed but is reconfigured to become insecure. If we would not run the keymgr, named would abruptly remove the DNSSEC records from the zone, making the zone bogus. Therefore, change the code such that a zone will use kasp if there is a valid dnssec-policy configured, or if there are state files available. (cherry picked from commit `cf420b2af0`)	2020-12-23 11:56:33 +01:00
Matthijs Mekking	6ff69ee8ba	Add function to see if dst key uses kasp For purposes of zones transitioning back to insecure mode, it is practical to see if related keys have a state file associated. (cherry picked from commit `8f2c5e45da`)	2020-12-23 11:56:25 +01:00
Mark Andrews	4d003dd0f8	Only pick CPUs that are part of the existing CPU affinity set when assigning a thread to a CPU. (cherry picked from commit `698d9285d4`)	2020-12-23 09:21:29 +11:00
Ondřej Surý	04f9f45c54	Print warning when falling back to increment soa serial method When using the `unixtime` or `date` method to update the SOA serial, `named` and `dnssec-signzone` would silently fallback to `increment` method to prevent the new serial number to be smaller than the old serial number (using the serial number arithmetics). Add a warning message when such fallback happens. (cherry picked from commit `ef685bab5c`)	2020-12-12 07:55:29 +01:00
Ondřej Surý	2c04299eb1	Fix HAVE_SO_REUSEPORT_LB macro name definition A typo in macro definition caused the load-balanced sockets to be disabled even on platforms with existing support for load-balanced sockets. (cherry picked from commit `5caf33feda`)	2020-12-09 10:46:16 +01:00
Ondřej Surý	90979a79e2	Sync the func() -> func(void) in netmgr	2020-12-09 10:46:16 +01:00
Ondřej Surý	bb9b55dfba	Use sock->nchildren instead of mgr->nworkers when initializing NM On Windows, we were limiting the number of listening children to just 1, but we were then iterating on mgr->nworkers. That lead to scheduling more async_*listen() than actually allocated and out-of-bound read-write operation on the heap. (cherry picked from commit `87c5867202`)	2020-12-09 10:46:16 +01:00
Ondřej Surý	857704b879	Explicitly link the netmgr tests with -luv	2020-12-09 10:46:16 +01:00
Ondřej Surý	7ec4ec3a81	Fix datarace when UDP/TCP connect fails and we are in nmthread When we were in nmthread, the isc__nm_async_<proto>connect() function executes in the same thread as the isc__nm_<proto>connect() and on a failure, it would block indefinitely because the failure branch was setting sock->active to false before the condition around the wait had a chance to skip the WAIT(). This also fixes the zero system test being stuck on FreeBSD 11, so we re-enable the test in the commit.	2020-12-09 10:46:16 +01:00
Ondřej Surý	90a9b0611a	Add FreeBSD connection timeout socket option On FreeBSD, the option to configure connection timeout is called TCP_KEEPINIT, use it to configure the connection timeout there. This also fixes the dangling socket problems in the unit test, so re-enable them.	2020-12-09 10:46:16 +01:00
Ondřej Surý	0ee8672692	Distribute queries among threads even on platforms without lb sockets On platforms without load-balancing socket all the queries would be handle by a single thread. Currently, the support for load-balanced sockets is present in Linux with SO_REUSEPORT and FreeBSD 12 with SO_REUSEPORT_LB. This commit adds workaround for such platforms that: 1. setups single shared listening socket for all listening nmthreads for UDP, TCP and TCPDNS netmgr transports 2. Calls uv_udp_bind/uv_tcp_bind on the underlying socket just once and for rest of the nmthreads only copy the internal libuv flags (should be just UV_HANDLE_BOUND and optionally UV_HANDLE_IPV6). 3. start reading on UDP socket or listening on TCP socket The load distribution among the nmthreads is uneven, but it's still better than utilizing just one thread for processing all the incoming queries	2020-12-09 10:46:16 +01:00
Ondřej Surý	4c70100ce0	Don't use stack allocated buffer for uv_write() On FreeBSD, the stack is destroyed more aggressively than on Linux and that revealed a bug where we were allocating the 16-bit len for the TCPDNS message on the stack and the buffer got garbled before the uv_write() sendback was executed. Now, the len is part of the uvreq, so we can safely pass it to the uv_write() as the req gets destroyed after the sendcb is executed. (cherry picked from commit `94afea9325`)	2020-12-09 10:46:16 +01:00
Michał Kępień	12fa8a7aed	Make netmgr initialize and cleanup Winsock itself On Windows, WSAStartup() needs to be called to initialize Winsock before any sockets are created or else socket() calls will return error code 10093 (WSANOTINITIALISED). Since BIND's Network Manager is intended to work as a reusable networking library, it should take care of calling WSAStartup() - and its cleanup counterpart, WSACleanup() - itself rather than relying on external code to do it. Add the necessary WSAStartup() and WSACleanup() calls to isc_nm_start() and isc_nm_destroy(), respectively. (cherry picked from commit `88f96faba8`)	2020-12-09 10:46:16 +01:00
Michał Kępień	216fc34490	Extend log message for unexpected socket() errors Make sure the error code is included in the message logged for unexpected socket creation errors in order to facilitate troubleshooting on Windows. (cherry picked from commit `dc2e1dea86`)	2020-12-09 10:46:16 +01:00
Ondřej Surý	e8e8ed7fb9	Adjust the nstests for isc_nmhandle_{attach,detach} name change Due to the added attach/detach tracing in the netmgr-v2 code, the libns tests needs to be adjusted as the real function names have changed from isc_nmhandle_* to isc__nmhandle_*.	2020-12-09 10:46:16 +01:00
Ondřej Surý	9b2184893d	The cmocka.h header MUST be included before isc/util.h gets included The isc/util.h header redefine the DbC checks (REQUIRE, INSIST, ...) to be cmocka "fake" assertions. However that means that cmocka.h needs to be included after UNIT_TESTING is defined but before isc/util.h is included. Because isc/util.h is included in most of the project headers this means that the sequence MUST be: #define UNIT_TESTING #include <cmocka.h> #include <isc/_anything_.h> See !2204 for other header requirements for including cmocka.h. (cherry picked from commit `0ba697fe8c`)	2020-12-09 10:46:16 +01:00
Ondřej Surý	7fc62f829d	Add libssl libraries to Windows build This commit extends the perl Configure script to also check for libssl in addition to libcrypto and change the vcxproj source files to link with both libcrypto and libssl.	2020-12-09 10:46:16 +01:00
Ondřej Surý	48759bd047	Fix the data race in accessing the isc_nm_t timers The following TSAN report about accessing the mgr timers (mgr->init, mgr->idle, mgr->keepalive and mgr->advertised) has been fixed in this commit: ================== WARNING: ThreadSanitizer: data race (pid=2746) Read of size 4 at 0x7b440008a948 by thread T18: #0 isc__nm_tcpdns_read /home/ondrej/Projects/bind9/lib/isc/netmgr/tcpdns.c:849:25 (libisc.so.1706+0x2ba0f) #1 isc_nm_read /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:1679:3 (libisc.so.1706+0x22258) #2 tcpdns_connect_connect_cb /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:363:2 (tcpdns_test+0x4bc5fb) #3 isc__nm_async_connectcb /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:1816:2 (libisc.so.1706+0x228c9) #4 isc__nm_connectcb /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:1791:3 (libisc.so.1706+0x22713) #5 tcpdns_connect_cb /home/ondrej/Projects/bind9/lib/isc/netmgr/tcpdns.c:343:2 (libisc.so.1706+0x2d89d) #6 uv__stream_connect /home/ondrej/Projects/tsan/libuv/src/unix/stream.c:1381:5 (libuv.so.1+0x27c18) #7 uv__stream_io /home/ondrej/Projects/tsan/libuv/src/unix/stream.c:1298:5 (libuv.so.1+0x25977) #8 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11 (libuv.so.1+0x2e795) #9 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:385:5 (libuv.so.1+0x158ec) #10 nm_thread /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:530:11 (libisc.so.1706+0x1c94a) Previous write of size 4 at 0x7b440008a948 by main thread: #0 isc_nm_settimeouts /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:490:12 (libisc.so.1706+0x1dda5) #1 tcpdns_recv_two /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:601:2 (tcpdns_test+0x4bad0e) #2 cmocka_run_one_test_or_fixture <null> (libcmocka.so.0+0x70be) #3 __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:308:16 (libc.so.6+0x2409a) Location is heap block of size 281 at 0x7b440008a840 allocated by main thread: #0 malloc <null> (tcpdns_test+0x42864b) #1 default_memalloc /home/ondrej/Projects/bind9/lib/isc/mem.c:713:8 (libisc.so.1706+0x6d261) #2 mem_get /home/ondrej/Projects/bind9/lib/isc/mem.c:622:8 (libisc.so.1706+0x69b9c) #3 isc___mem_get /home/ondrej/Projects/bind9/lib/isc/mem.c:1044:9 (libisc.so.1706+0x6d379) #4 isc__mem_get /home/ondrej/Projects/bind9/lib/isc/mem.c:2432:10 (libisc.so.1706+0x6889e) #5 isc_nm_start /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:203:8 (libisc.so.1706+0x1c219) #6 nm_setup /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:244:11 (tcpdns_test+0x4baaa4) #7 cmocka_run_one_test_or_fixture <null> (libcmocka.so.0+0x70fd) #8 __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:308:16 (libc.so.6+0x2409a) Thread T18 'isc-net-0000' (tid=3513, running) created by main thread at: #0 pthread_create <null> (tcpdns_test+0x429e7b) #1 isc_thread_create /home/ondrej/Projects/bind9/lib/isc/pthreads/thread.c:73:8 (libisc.so.1706+0x8476a) #2 isc_nm_start /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:271:3 (libisc.so.1706+0x1c66a) #3 nm_setup /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:244:11 (tcpdns_test+0x4baaa4) #4 cmocka_run_one_test_or_fixture <null> (libcmocka.so.0+0x70fd) #5 __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:308:16 (libc.so.6+0x2409a) SUMMARY: ThreadSanitizer: data race /home/ondrej/Projects/bind9/lib/isc/netmgr/tcpdns.c:849:25 in isc__nm_tcpdns_read ================== ThreadSanitizer: reported 1 warnings (cherry picked from commit `2e1dd56d0b`)	2020-12-09 10:46:16 +01:00
Ondřej Surý	a61b7294c2	Avoid netievent allocations when the callbacks can be called directly After turning the users callbacks to be asynchronous, there was a visible performance drop. This commit prevents the unnecessary allocations while keeping the code paths same for both asynchronous and synchronous calls. The same change was done to the isc__nm_udp_{read,send} as those two functions are in the hot path. (cherry picked from commit `d6d2fbe0e9`)	2020-12-09 10:46:16 +01:00
Ondřej Surý	7b9c8b9781	Refactor netmgr and add more unit tests This is a part of the works that intends to make the netmgr stable, testable, maintainable and tested. It contains a numerous changes to the netmgr code and unfortunately, it was not possible to split this into smaller chunks as the work here needs to be committed as a complete works. NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and tcpdns.c and it should be a subject to refactoring in the future. The changes that are included in this commit are listed here (extensively, but not exclusively): * The netmgr_test unit test was split into individual tests (udp_test, tcp_test, tcpdns_test and newly added tcp_quota_test) * The udp_test and tcp_test has been extended to allow programatic failures from the libuv API. Unfortunately, we can't use cmocka mock() and will_return(), so we emulate the behaviour with #define and including the netmgr/{udp,tcp}.c source file directly. * The netievents that we put on the nm queue have variable number of members, out of these the isc_nmsocket_t and isc_nmhandle_t always needs to be attached before enqueueing the netievent_<foo> and detached after we have called the isc_nm_async_<foo> to ensure that the socket (handle) doesn't disappear between scheduling the event and actually executing the event. * Cancelling the in-flight TCP connection using libuv requires to call uv_close() on the original uv_tcp_t handle which just breaks too many assumptions we have in the netmgr code. Instead of using uv_timer for TCP connection timeouts, we use platform specific socket option. * Fix the synchronization between {nm,async}_{listentcp,tcpconnect} When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was waiting for socket to either end up with error (that path was fine) or to be listening or connected using condition variable and mutex. Several things could happen: 0. everything is ok 1. the waiting thread would miss the SIGNAL() - because the enqueued event would be processed faster than we could start WAIT()ing. In case the operation would end up with error, it would be ok, as the error variable would be unchanged. 2. the waiting thread miss the sock->{connected,listening} = `true` would be set to `false` in the tcp_{listen,connect}close_cb() as the connection would be so short lived that the socket would be closed before we could even start WAIT()ing * The tcpdns has been converted to using libuv directly. Previously, the tcpdns protocol used tcp protocol from netmgr, this proved to be very complicated to understand, fix and make changes to. The new tcpdns protocol is modeled in a similar way how tcp netmgr protocol. Closes: #2194, #2283, #2318, #2266, #2034, #1920 * The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to pass accepted TCP sockets between netthreads, but instead (similar to UDP) uses per netthread uv_loop listener. This greatly reduces the complexity as the socket is always run in the associated nm and uv loops, and we are also not touching the libuv internals. There's an unfortunate side effect though, the new code requires support for load-balanced sockets from the operating system for both UDP and TCP (see #2137). If the operating system doesn't support the load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB on FreeBSD 12+), the number of netthreads is limited to 1. * The netmgr has now two debugging #ifdefs: 1. Already existing NETMGR_TRACE prints any dangling nmsockets and nmhandles before triggering assertion failure. This options would reduce performance when enabled, but in theory, it could be enabled on low-performance systems. 2. New NETMGR_TRACE_VERBOSE option has been added that enables extensive netmgr logging that allows the software engineer to precisely track any attach/detach operations on the nmsockets and nmhandles. This is not suitable for any kind of production machine, only for debugging. * The tlsdns netmgr protocol has been split from the tcpdns and it still uses the old method of stacking the netmgr boxes on top of each other. We will have to refactor the tlsdns netmgr protocol to use the same approach - build the stack using only libuv and openssl. * Limit but not assert the tcp buffer size in tcp_alloc_cb Closes: #2061 (cherry picked from commit `634bdfb16d`)	2020-12-09 10:46:16 +01:00
Ondřej Surý	fa9ca83862	Turn all the callback to be always asynchronous When calling the high level netmgr functions, the callback would be sometimes called synchronously if we catch the failure directly, or asynchronously if it happens later. The synchronous call to the callback could create deadlocks as the caller would not expect the failed callback to be executed directly. (cherry picked from commit `a49d88568f`)	2020-12-09 10:46:16 +01:00
Ondřej Surý	bcc9ad98ea	netmgr: Add additional safeguards to netmgr/tls.c This commit adds couple of additional safeguards against running sends/reads on inactive sockets. The changes was modeled after the changes we made to netmgr/tcpdns.c (cherry picked from commit `fa424225af`)	2020-12-09 10:46:16 +01:00
Witold Kręcicki	b83dff0585	isc_nm_tls_create_server_ctx can create ephemeral certs In-memory ephemeral certs creation for easy DoT/DoH deployment. (cherry picked from commit `3c00fb71db`)	2020-12-09 10:46:16 +01:00
Witold Kręcicki	d7fa046a69	Add DoT support to bind Parse the configuration of tls objects into SSL_CTX* objects. Listen on DoT if 'tls' option is setup in listen-on directive. Use DoT/DoH ports for DoT/DoH. (cherry picked from commit `38b78f59a0`)	2020-12-09 10:46:16 +01:00
Evan Hunt	0f5fff5c1e	report peer address in TLS mode, and specify protocol - peer address was not being reported correctly by "dig +tls" - the protocol used is now reported in the dig output: UDP, TCP, or TLS. (cherry picked from commit `8886569e9d`)	2020-12-09 10:46:16 +01:00
Witold Kręcicki	4a854da141	netmgr: server-side TLS support Add server-side TLS support to netmgr - that includes moving some of the isc_nm_ functions from tcp.c to a wrapper in netmgr.c calling a proper tcp or tls function, and a new isc_nm_listentls() function. Add DoT support to tcpdns - isc_nm_listentlsdns(). (cherry picked from commit `b2ee0e9dc3`)	2020-12-09 10:46:16 +01:00
Evan Hunt	6f6f0e26ab	address some possible shutdown races in xfrin there were two failures during observed in testing, both occurring when 'rndc halt' was run rather than 'rndc stop' - the latter dumps zone contents to disk and presumably introduced enough delay to prevent the races: - a failure when the zone was shut down and called dns_xfrin_detach() before the xfrin had finished connecting; the connect timeout terminated without detaching its handle - a failure when the tcpdns socket timer fired after the outerhandle had already been cleared. this commit incidentally addresses a failure observed in mutexatomic due to a variable having been initialized incorrectly.	2020-12-09 10:46:16 +01:00
Ondřej Surý	c4dcedd2dc	netmgr: Don't crash if socket() returns an error in udpconnect socket() call can return an error - e.g. EMFILE, so we need to handle this nicely and not crash. Additionally wrap the socket() call inside a platform independent helper function as the Socket data type on Windows is unsigned integer: > This means, for example, that checking for errors when the socket and > accept functions return should not be done by comparing the return > value with –1, or seeing if the value is negative (both common and > legal approaches in UNIX). Instead, an application should use the > manifest constant INVALID_SOCKET as defined in the Winsock2.h header > file. (cherry picked from commit `8af7f81d6c`)	2020-12-09 10:46:16 +01:00
Ondřej Surý	21daa258a2	netmgr: Always load the result from async socket Because we use result earlier for setting the loadbalancing on the socket, we could be left with a ISC_R_NOTIMPLEMENTED value stored in the variable and when the UDP connection would succeed, we would errorneously return this value instead of ISC_R_SUCCESS. (cherry picked from commit `050258bda4`)	2020-12-09 10:46:16 +01:00
Evan Hunt	70e08cab6b	dig: use new netmgr timeout mechanism use isc_nmhandle_settimeout() to set read/recv timeouts, and get rid of connect_timeout() and related functions in dighost.c. (cherry picked from commit `ea2b04c361`)	2020-12-09 10:46:16 +01:00
Evan Hunt	4598d7b30d	add isc_nmhandle_settimeout() function this function sets the read timeout for the socket associated with a netmgr handle and, if the timer is running, resets it. for TCPDNS sockets it also sets the read timeout and resets the timer on the outer TCP socket. (cherry picked from commit `4be63c5b00`)	2020-12-09 10:46:16 +01:00

1 2 3 4 5 ...

13071 commits