bind9

mirror of https://github.com/isc-projects/bind9.git synced 2026-04-29 09:59:08 -04:00

Author	SHA1	Message	Date
Michał Kępień	ba1306bfb4	Check for NULL before dereferencing qctx->rpz_st Commit `9ffb4a7ba1` causes Clang Static Analyzer to flag a potential NULL dereference in query_nxdomain(): query.c:9394:26: warning: Dereference of null pointer [core.NullDereference] if (!qctx->nxrewrite \|\| qctx->rpz_st->m.rpz->addsoa) { ^~~~~~~~~~~~~~~~~~~ 1 warning generated. The warning above is for qctx->rpz_st potentially being a NULL pointer when query_nxdomain() is called from query_resume(). This is a false positive because none of the database lookup result codes currently causing query_nxdomain() to be called (DNS_R_EMPTYWILD, DNS_R_NXDOMAIN) can be returned by a database lookup following a recursive resolution attempt. Add a NULL check nevertheless in order to future-proof the code and silence Clang Static Analyzer. (cherry picked from commit `07592d1315`) (cherry picked from commit `a4547a1093`)	2023-01-09 13:57:44 +01:00
Matthijs Mekking	2696267b1f	Consider non-stale data when in serve-stale mode With 'stale-answer-enable yes;' and 'stale-answer-client-timeout off;', consider the following situation: A CNAME record and its target record are in the cache, then the CNAME record expires, but the target record is still valid. When a new query for the CNAME record arrives, and the query fails, the stale record is used, and then the query "restarts" to follow the CNAME target. The problem is that the query's multiple stale options (like DNS_DBFIND_STALEOK) are not reset, so 'query_lookup()' treats the restarted query as a lookup following a failed lookup, and returns a SERVFAIL answer when there is no stale data found in the cache, even if there is valid non-stale data there available. With this change, query_lookup() now considers non-stale data in the cache in the first place, and returns it if it is available. (cherry picked from commit `91a1a8efc5`)	2023-01-09 13:57:43 +01:00
Tom Krizek	da42fa7622	Revert "Merge branch '3678-serve-stale-servfailing-unexpectedly-v9_16' into 'v9_16'" This reverts commit `b2a4447af8`, reversing changes made to `8924f92956`. It also removes release note 6038, since the fix is reverted.	2022-12-08 10:23:40 +01:00
Mark Andrews	4f3327cd41	Extend dns_db_allrdatasets to control interation results Add an options parameter to control what rdatasets are returned when iteratating over the node. Specific modes will be added later. (cherry picked from commit `7695c36a5d`)	2022-12-08 11:20:35 +11:00
Michał Kępień	148608c7b2	Check for NULL before dereferencing qctx->rpz_st Commit `9ffb4a7ba1` causes Clang Static Analyzer to flag a potential NULL dereference in query_nxdomain(): query.c:9394:26: warning: Dereference of null pointer [core.NullDereference] if (!qctx->nxrewrite \|\| qctx->rpz_st->m.rpz->addsoa) { ^~~~~~~~~~~~~~~~~~~ 1 warning generated. The warning above is for qctx->rpz_st potentially being a NULL pointer when query_nxdomain() is called from query_resume(). This is a false positive because none of the database lookup result codes currently causing query_nxdomain() to be called (DNS_R_EMPTYWILD, DNS_R_NXDOMAIN) can be returned by a database lookup following a recursive resolution attempt. Add a NULL check nevertheless in order to future-proof the code and silence Clang Static Analyzer. (cherry picked from commit `07592d1315`)	2022-12-06 13:51:30 +00:00
Matthijs Mekking	e6e13c3e62	Consider non-stale data when in serve-stale mode With 'stale-answer-enable yes;' and 'stale-answer-client-timeout off;', consider the following situation: A CNAME record and its target record are in the cache, then the CNAME record expires, but the target record is still valid. When a new query for the CNAME record arrives, and the query fails, the stale record is used, and then the query "restarts" to follow the CNAME target. The problem is that the query's multiple stale options (like DNS_DBFIND_STALEOK) are not reset, so 'query_lookup()' treats the restarted query as a lookup following a failed lookup, and returns a SERVFAIL answer when there is no stale data found in the cache, even if there is valid non-stale data there available. With this change, query_lookup() now considers non-stale data in the cache in the first place, and returns it if it is available. (cherry picked from commit `86a80e723f`)	2022-12-06 13:51:07 +00:00
Michal Nowak	771fed4a14	Update sources to Clang 15 formatting	2022-11-29 10:30:34 +01:00
Evan Hunt	8e4a1f3483	ensure RPZ lookups handle CD=1 correctly RPZ rewrites called dns_db_findext() without passing through the client database options; as as result, if the client set CD=1, DNS_DBFIND_PENDINGOK was not used as it should have been, and cache lookups failed, resulting in failure of the rewrite. (cherry picked from commit `305a50dbe1`)	2022-10-19 13:16:51 -07:00
Aram Sargsyan	b6aeccf697	Fix ns_statscounter_recursclients counting bug The incrementing and decrementing of 'ns_statscounter_recursclients' were not properly balanced: for example, it would be incremented for a prefetch query but not decremented if the query failed. This commit ensures that the recursion quota and the recursive clients counter are always in sync with each other. (cherry picked from commit `82991451b4`)	2022-10-18 10:38:04 +00:00
Matthijs Mekking	3f68e2ad83	Only refresh RRset once Don't attempt to resolve DNS responses for intermediate results. This may create multiple refreshes and can cause a crash. One scenario is where for the query there is a CNAME and canonical answer in cache that are both stale. This will trigger a refresh of the RRsets because we encountered stale data and we prioritized it over the lookup. It will trigger a refresh of both RRsets. When we start recursing, it will detect a recursion loop because the recursion parameters will eventually be the same. In 'dns_resolver_destroyfetch' the sanity check fails, one of the callers did not get its event back before trying to destroy the fetch. Move the call to 'query_refresh_rrset' to 'ns_query_done', so that it is only called once per client request. Another scenario is where for the query there is a stale CNAME in the cache that points to a record that is also in cache but not stale. This will trigger a refresh of the RRset (because we encountered stale data and we prioritized it over the lookup). We mark RRsets that we add to the message with DNS_RDATASETATTR_STALE_ADDED to prevent adding a duplicate RRset when a stale lookup and a normal lookup conflict with each other. However, the other non-stale RRset when following a CNAME chain will be added to the message without setting that attribute, because it is not stale. This is a variant of the bug in #2594. The fix covered the same crash but for stale-answer-client-timeout > 0. Fix this by clearing all RRsets from the message before refreshing. This requires the refresh to happen after the query is send back to the client. (cherry picked from commit `d939d2ecde`)	2022-09-08 12:08:28 +02:00
Aram Sargsyan	3ad0f165ab	Fix RRL responses-per-second bypass using wildcard names It is possible to bypass Response Rate Limiting (RRL) `responses-per-second` limitation using specially crafted wildcard names, because the current implementation, when encountering a found DNS name generated from a wildcard record, just strips the leftmost label of the name before making a key for the bucket. While that technique helps with limiting random requests like <random>.example.com (because all those requests will be accounted as belonging to a bucket constructed from "example.com" name), it does not help with random names like subdomain.<random>.example.com. The best solution would have been to strip not just the leftmost label, but as many labels as necessary until reaching the suffix part of the wildcard record from which the found name is generated, however, we do not have that information readily available in the context of RRL processing code. Fix the issue by interpreting all valid wildcard domain names as the zone's origin name concatenated to the "*" name, so they all will be put into the same bucket. (cherry picked from commit `baa9698c9d`)	2022-09-08 09:41:15 +02:00
Matthijs Mekking	dd7dde5743	Don't enable serve-stale on duplicate queries When checking if we should enable serve-stale, add an early out case when the result is an error signalling a duplicate query or a query that would be dropped. (cherry picked from commit 059a4c2f4d9d3cff371842f43208d021509314fa)	2022-08-09 09:37:49 +02:00
Evan Hunt	82c197d93b	Cleanup: always count ns_statscounter_recursclients The ns_statscounter_recursclients counter was previously only incremented or decremented if client->recursionquota was non-NULL. This was harmless, because that value should always be non-NULL if recursion is enabled, but it made the code slightly confusing. (cherry picked from commit `0201eab655`)	2022-05-14 00:58:26 -07:00
Mark Andrews	8f23d56fba	Check the cache as well when glue NS are returned processing RPZ (cherry picked from commit `8fb72012e3`)	2022-05-04 23:53:21 +10:00
Mark Andrews	8c2ede6edc	Process learned records as well as glue (cherry picked from commit `07c828531c`)	2022-05-04 23:53:21 +10:00
Mark Andrews	13129872eb	Process the delegating NS RRset when checking rpz rules (cherry picked from commit `cf97c61f48`)	2022-05-04 23:53:21 +10:00
Ondřej Surý	79b7804ce8	Consistenly use UNREACHABLE() instead of ISC_UNREACHABLE() In couple places, we have missed INSIST(0) or ISC_UNREACHABLE() replacement on some branches with UNREACHABLE(). Replace all ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().	2022-03-28 23:28:05 +02:00
Ondřej Surý	b624be2544	Remove use of the inline keyword used as suggestion to compiler Historically, the inline keyword was a strong suggestion to the compiler that it should inline the function marked inline. As compilers became better at optimising, this functionality has receded, and using inline as a suggestion to inline a function is obsolete. The compiler will happily ignore it and inline something else entirely if it finds that's a better optimisation. Therefore, remove all the occurences of the inline keyword with static functions inside single compilation unit and leave the decision whether to inline a function or not entirely on the compiler NOTE: We keep the usage the inline keyword when the purpose is to change the linkage behaviour. (cherry picked from commit `20f0936cf2`)	2022-03-25 09:37:18 +01:00
Ondřej Surý	75f9dd8e82	Simplify way we tag unreachable code with only ISC_UNREACHABLE() Previously, the unreachable code paths would have to be tagged with: INSIST(0); ISC_UNREACHABLE(); There was also older parts of the code that used comment annotation: /* NOTREACHED */ Unify the handling of unreachable code paths to just use: UNREACHABLE(); The UNREACHABLE() macro now asserts when reached and also uses __builtin_unreachable(); when such builtin is available in the compiler. (cherry picked from commit `584f0d7a7e`)	2022-03-25 09:33:51 +01:00
Ondřej Surý	673e53f81d	Add FALLTHROUGH macro for __attribute__((fallthrough)) Gcc 7+ and Clang 10+ have implemented __attribute__((fallthrough)) which is explicit version of the /* FALLTHROUGH / comment we are currently using. Add and apply FALLTHROUGH macro that uses the attribute if available, but does nothing on older compilers. In one case (lib/dns/zone.c), using the macro revealed that we were using the / FALLTHROUGH */ comment in wrong place, remove that comment. (cherry picked from commit `fe7ce629f4`)	2022-03-25 09:30:16 +01:00
Ondřej Surý	2c86bd4ed9	Remove debugging implementation of stdatomic using mutexes Upcoming LLVM/Clang 15 has marked the ATOMIC_VAR_INIT() as deprecated breaking the build. In the previous commit, we have removed the use of ATOMIC_VAR_INIT(), but as that was a prerequisite to using the --enable-mutexatomic debugging mode, we have to remove the debugging mode.	2022-03-17 21:44:04 +01:00
Michał Kępień	60e82835ec	Fix more ns_statscounter_recursclients underflows Commit `aab691d512` did not fix all possible scenarios in which the ns_statscounter_recursclients counter underflows. The solution implemented therein can be ineffective e.g. when CNAME chaining happens with prefetching enabled. Here is an example recursive resolution scenario in which the ns_statscounter_recursclients counter can underflow with the current logic in effect: 1. Query processing starts, the answer is not found in the cache, so recursion is started. The NS_CLIENTATTR_RECURSING attribute is set. ns_statscounter_recursclients is incremented (Δ = +1). 2. Recursion completes, returning a CNAME. client->recursionquota is non-NULL, so the NS_CLIENTATTR_RECURSING attribute remains set. ns_statscounter_recursclients is decremented (Δ = 0). 3. Query processing restarts. 4. The current QNAME (the target of the CNAME from step 2) is found in the cache, with a TTL low enough to trigger a prefetch. 5. query_prefetch() attaches to client->recursionquota. ns_statscounter_recursclients is not incremented because query_prefetch() does not do that (Δ = 0). 6. Query processing restarts. 7. The current QNAME (the target of the CNAME from step 4) is not found in the cache, so recursion is started. client->recursionquota is already attached to (since step 5) and the NS_CLIENTATTR_RECURSING attribute is set (since step 1), so ns_statscounter_recursclients is not incremented (Δ = 0). 8. The prefetch from step 5 completes. client->recursionquota is detached from in prefetch_done(). ns_statscounter_recursclients is not decremented because prefetch_done() does not do that (Δ = 0). 9. Recursion for the current QNAME completes. client->recursionquota is already detached from, i.e. set to NULL (since step 8), and the NS_CLIENTATTR_RECURSING attribute is set (since step 1), so ns_statscounter_recursclients is decremented (Δ = -1). Another possible scenario is that after step 7, recursion for the target of the CNAME from step 4 completes before the prefetch for the CNAME itself. fetch_callback() then notices that client->recursionquota is non-NULL and decrements ns_statscounter_recursclients, even though client->recursionquota was attached to by query_prefetch() and therefore not accompanied by an incrementation of ns_statscounter_recursclients. The net result is also an underflow. Instead of trying to properly handle all possible orderings of events set into motion by normal recursion and prefetch-triggered recursion, adjust ns_statscounter_recursclients whenever the recursive clients quota is successfully attached to or detached from. Remove the NS_CLIENTATTR_RECURSING attribute altogether as its only purpose is made obsolete by this change. (cherry picked from commit `f7482b68b9`)	2022-02-23 14:45:06 +01:00
Evan Hunt	558b060de5	allow dns_clientinfo to store client ECS data this brings DNS_CLIENTINFO_VERSION into line with the subscription branch so that fixes applied to clientinfo processing can also be applied to the main branch without diverging. (cherry picked from commit `737e658602`)	2022-01-27 16:08:57 -08:00
Ondřej Surý	2bf7921c7e	Update the copyright information in all files in the repository This commit converts the license handling to adhere to the REUSE specification. It specifically: 1. Adds used licnses to LICENSES/ directory 2. Add "isc" template for adding the copyright boilerplate 3. Changes all source files to include copyright and SPDX license header, this includes all the C sources, documentation, zone files, configuration files. There are notes in the doc/dev/copyrights file on how to add correct headers to the new files. 4. Handle the rest that can't be modified via .reuse/dep5 file. The binary (or otherwise unmodifiable) files could have license places next to them in <foo>.license file, but this would lead to cluttered repository and most of the files handled in the .reuse/dep5 file are system test files. (cherry picked from commit `58bd26b6cf`)	2022-01-11 12:22:09 +01:00
Evan Hunt	ddc677ae64	rename dns_zone_master and dns_zone_slave dns_zone_master and dns_zone_slave are renamed as dns_zone_primary and dns_zone_secondary. (cherry picked from commit `916760ae46`)	2021-08-30 11:58:29 -07:00
Mark Andrews	24e5e3ffd6	Make whether to follow additional data records generic Adds dns_rdatatype_followadditional() and DNS_RDATATYPEATTR_FOLLOWADDITIONAL (cherry picked from commit `f0265b8fa6`)	2021-08-18 14:59:20 +10:00
Michał Kępień	f81c8e3e73	Tweak query_addds() comments to avoid confusion It has been noticed that commit `f88c90f47f` did not only fix NSEC record handling in signed, insecure delegations prepared using both wildcard expansion and CNAME chaining - it also inadvertently fixed DS record handling in signed, secure delegations of that flavor. This is because the 'rdataset' variable in the relevant location in query_addds() can be either a DS RRset or an NSEC RRset. Update a code comment in query_addds() to avoid confusion. Update the comments describing the purpose of query_addds() so that they also mention NSEC(3) records. (cherry picked from commit `29d8d35869`)	2021-07-16 07:24:34 +02:00
Michał Kępień	f88c90f47f	Fix "no DS" proofs for wildcard+CNAME delegations When answering a query requires wildcard expansion, the AUTHORITY section of the response needs to include NSEC(3) record(s) proving that the QNAME does not exist. When a response to a query is an insecure delegation, the AUTHORITY section needs to include an NSEC(3) proof that no DS record exists at the parent side of the zone cut. These two conditions combined trip up the NSEC part of the logic contained in query_addds(), which expects the NS RRset to be owned by the first name found in the AUTHORITY section of a delegation response. This may not always be true, for example if wildcard expansion causes an NSEC record proving QNAME nonexistence to be added to the AUTHORITY section before the delegation is added to the response. In such a case, named incorrectly omits the NSEC record proving nonexistence of QNAME from the AUTHORITY section. The same block of code is affected by another flaw: if the same NSEC record proves nonexistence of both the QNAME and the DS record at the parent side of the zone cut, this NSEC record will be added to the AUTHORITY section twice. Fix by looking for the NS RRset in the entire AUTHORITY section and adding the NSEC record to the delegation using query_addrrset() (which handles duplicate RRset detection). (cherry picked from commit `7a87bf468b`)	2021-06-10 10:26:51 +02:00
Kevin Chen	3924f78748	Several serve-stale improvements Commit `a83c8cb0af` updated masterdump so that stale records in "rndc dumpdb" output no longer shows 0 TTLs. In this commit we change the name of the `rdataset->stale_ttl` field to `rdataset->expired` to make its purpose clearer, and set it to zero in cases where it's unused. Add 'rbtdb->serve_stale_ttl' to various checks so that stale records are not purged from the cache when they've been stale for RBTDB_VIRTUAL (300) seconds. Increment 'ns_statscounter_usedstale' when a stale answer is used. Note: There was a question of whether 'overmem_purge' should be purging ancient records, instead of stale ones. It is left as purging stale records, since stale records could take up the majority of the cache. This submission is copyrighted Akamai Technologies, Inc. and provided under an MPL 2.0 license. This commit was originally authored by Kevin Chen, and was updated by Matthijs Mekking to match recent serve-stale developments. (cherry picked from commit `0cdf85d204`)	2021-05-30 12:30:36 -07:00
Matthijs Mekking	a2ec3a1e4c	Reset DNS_FETCHOPT_TRYSTALE_ONTIMEOUT on resume Once we resume a query, we should clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT from the options to prevent triggering the stale-answer-client-timeout on subsequent fetches. If we don't this may cause a crash when for example when prefetch is triggered after a query restart. (cherry picked from commit `c0dc5937c7`)	2021-05-30 00:33:42 -07:00
Evan Hunt	85ca29e5e2	clean up query correctly if already answered by serve-stale when a serve-stale answer has been sent, the client continues waiting for a proper answer. if a final completion event for the client does arrive, it can just be cleaned up without sending a response, similar to a canceled fetch. (cherry picked from commit `8bd8e995f1`)	2021-05-27 12:09:43 -07:00
Ondřej Surý	ac25fb9439	Use dns_name_copynf() with dns_message_gettempname() when needed dns_message_gettempname() returns an initialized name with a dedicated buffer, associated with a dns_fixedname object. Using dns_name_copynf() to write a name into this object will actually copy the name data from a source name. dns_name_clone() merely points target->ndata to source->ndata, so it is faster, but it can lead to a use-after-free if the source is freed before the target object is released via dns_message_puttempname(). In a few places, clone was being used where copynf should have been; this is now fixed. As a side note, no memory was lost, because the ndata buffer used in the dns_fixedname_t is internal to the structure, and is freed when the dns_fixedname_t is freed regardless of the .ndata contents. (cherry picked from commit `ce3e1abc1d`)	2021-05-22 07:17:01 +02:00
Evan Hunt	dccdb492ef	use a fixedname buffer in dns_message_gettempname() dns_message_gettempname() now returns a pointer to an initialized name associated with a dns_fixedname_t object. it is no longer necessary to allocate a buffer for temporary names associated with the message object. (cherry picked from commit `e31cc1eeb4`)	2021-05-22 07:13:57 +02:00
Matthijs Mekking	72deed194d	Use isdigit instead of checking character range When looking for key files, we could use isdigit rather than checking if the character is within the range [0-9]. Use (unsigned char) cast to ensure the value is representable in the unsigned char type (as suggested by the isdigit manpage). Change " & 0xff" occurrences to the recommended (unsigned char) type cast. (cherry picked from commit 1998ad6c776a9c17c27788b17765dee90d9e25df)	2021-05-05 18:23:53 +02:00
Mark Andrews	ea443fa9ba	Handle DNAME lookup via itself When answering a query, named should never attempt to add the same RRset to the ANSWER section more than once. However, such a situation may arise when chasing DNAME records: one of the DNAME records placed in the ANSWER section may turn out to be the final answer to a client query, but there is no way to know that in advance. Tweak the relevant INSIST assertion in query_respond() so that it handles this case properly. qctx->rdataset is freed later anyway, so there is no need to clean it up in query_respond().	2021-04-29 11:12:38 +02:00
Matthijs Mekking	4615cbb597	Serve-stale nit fixes While working on the serve-stale backports, I noticed the following oddities: 1. In the serve-stale system test, in one case we keep track of the time how long it took for dig to complete. In commit `aaed7f9d8c`, the code removed the exception to check for result == ISC_R_SUCCESS on stale found answers, and adjusted the test accordingly. This failed to update the time tracking accordingly. Move the t1/t2 time track variables back around the two dig commands to ensure the lookups resolved faster than the resolver-query-timeout. 2. We can remove the setting of NS_QUERYATTR_STALEOK and DNS_RDATASETATTR_STALE_ADDED on the "else if (stale_timeout)" code path, because they are added later when we know we have actually found a stale answer on a stale timeout lookup. 3. We should clear the NS_QUERYATTR_STALEOK flag from the client query attributes instead of DNS_RDATASETATTR_STALE_ADDED (that flag is set on the rdataset attributes). 4. In 'bin/named/config.c' we should set the configuration options in alpabetical order. 5. In the ARM, in the backports we have added "(stale)" between "cached" and "RRset" to make more clear a stale RRset may be returned in this scenario. (cherry picked from commit `104b676235`)	2021-04-28 13:53:52 +02:00
Matthijs Mekking	194a72b3f1	If RPZ config'd, bail stale-answer-client-timeout When we are recursing, RPZ processing is not allowed. But when we are performing a lookup due to "stale-answer-client-timeout", we are still recursing. This effectively means that RPZ processing is disabled on such a lookup. In this case, bail the "stale-answer-client-timeout" lookup and wait for recursion to complete, as we we can't perform the RPZ rewrite rules reliably. (cherry picked from commit `3d3a6415f7`)	2021-04-02 13:29:27 +02:00
Matthijs Mekking	29bcd113ea	Rename "staleonly" The dboption DNS_DBFIND_STALEONLY caused confusion because it implies we are looking for stale data only and ignore any active RRsets in the cache. Rename it to DNS_DBFIND_STALETIMEOUT as it is more clear the option is related to a lookup due to "stale-answer-client-timeout". Rename other usages of "staleonly", instead use "lookup due to...". Also rename related function and variable names. (cherry picked from commit `839df94190`)	2021-04-02 13:29:17 +02:00
Matthijs Mekking	34dd6521b1	Restore the RECURSIONOK attribute after staleonly When doing a staleonly lookup we don't want to fallback to recursion. After all, there are obviously problems with recursion, otherwise we wouldn't do a staleonly lookup. When resuming from recursion however, we should restore the RECURSIONOK flag, allowing future required lookups for this client to recurse. (cherry picked from commit `3f81d79ffb`)	2021-04-02 13:29:09 +02:00
Matthijs Mekking	114dc7888a	Remove result exception on staleonly lookup When implementing "stale-answer-client-timeout", we decided that we should only return positive answers prematurely to clients. A negative response is not useful, and in that case it is better to wait for the recursion to complete. To do so, we check the result and if it is not ISC_R_SUCCESS, we decide that it is not good enough. However, there are more return codes that could lead to a positive answer (e.g. CNAME chains). This commit removes the exception and now uses the same logic that other stale lookups use to determine if we found a useful stale answer (stale_found == true). This means we can simplify two test cases in the serve-stale system test: nodata.example is no longer treated differently than data.example. (cherry picked from commit `aaed7f9d8c`)	2021-04-02 13:28:59 +02:00
Matthijs Mekking	06823aa255	Remove INSIST on NS_QUERYATTR_ANSWERED The NS_QUERYATTR_ANSWERED attribute is to prevent sending a response twice. Without the attribute, this may happen if a staleonly lookup found a useful answer and sends a response to the client, and later recursion ends and also tries to send a response. The attribute was also used to mask adding a duplicate RRset. This is considered harmful. When we created a response to the client with a stale only lookup (regardless if we actually have send the response), we should clear the rdatasets that were added during that lookup. Mark such rdatasets with the a new attribute, DNS_RDATASETATTR_STALE_ADDED. Set a query attribute NS_QUERYATTR_STALEOK if we may have added rdatasets during a stale only lookup. Before creating a response on a normal lookup, check if we can expect rdatasets to have been added during a staleonly lookup. If so, clear the rdatasets from the message with the attribute DNS_RDATASETATTR_STALE_ADDED set. (cherry picked from commit `3d5429f61f`)	2021-04-02 13:28:08 +02:00
Matthijs Mekking	33d61b9651	Simplify when to detach the client With stale-answer-client-timeout, we may send a response to the client, but we may want to hold on to the network manager handle, because recursion is going on in the background, or we need to refresh a stale RRset. Simplify the setting of 'nodetach': * During a staleonly lookup we should not detach the nmhandle, so just set it prior to 'query_lookup()'. * During a staleonly "stalefirst" lookup set the 'nodetach' to true if we are going to refresh the RRset. Now there is no longer the need to clear the 'nodetach' if we go through the "dbfind_stale", "stale_refresh_window", or "stale_only" paths. (cherry picked from commit `48b0dc159b`)	2021-04-02 13:28:01 +02:00
Matthijs Mekking	b1496d19d5	Refactor stale lookups, ignore active RRsets When doing a staleonly lookup, ignore active RRsets from cache. If we don't, we may add a duplicate RRset to the message, and hit an assertion failure in query.c because adding the duplicate RRset to the ANSWER section failed. This can happen on a race condition. When a client query is received, the recursion is started. When 'stale-answer-client-timeout' triggers around the same time the recursion completes, the following sequence of events may happen: 1. Queue the "try stale" fetch_callback() event to the client task. 2. Add the RRsets from the authoritative response to the cache. 3. Queue the "fetch complete" fetch_callback() event to the client task. 4. Execute the "try stale" fetch_callback(), which retrieves the just-inserted RRset from the database. 5. In "ns_query_done()" we are still recursing, but the "staleonly" query attribute has already been cleared. In other words, the query will resume when recursion ends (it already has ended but is still on the task queue). 6. Execute the "fetch complete" fetch_callback(). It finds the answer from recursion in the cache again and tries to add the duplicate to the answer section. This commit changes the logic for finding stale answers in the cache, such that on "stale_only" lookups actually only stale RRsets are considered. It refactors the code so that code paths for "dbfind_stale", "stale_refresh_window", and "stale_only" are more clear. First we call some generic code that applies in all three cases, formatting the domain name for logging purposes, increment the trystale stats, and check if we actually found stale data that we can use. The "dbfind_stale" lookup will return SERVFAIL if we didn't found a usable answer, otherwise we will continue with the lookup (query_gotanswer()). This is no different as before the introduction of "stale-answer-client-timeout" and "stale-refresh-time". The "stale_refresh_window" lookup is similar to the "dbfind_stale" lookup: return SERVFAIL if we didn't found a usable answer, otherwise continue with the lookup (query_gotanswer()). Finally the "stale_only" lookup. If the "stale_only" lookup was triggered because of an actual client timeout (stale-answer-client-timeout > 0), and if database lookup returned a stale usable RRset, trigger a response to the client. Otherwise return and wait until the recursion completes (or the resolver query times out). If the "stale_only" lookup is a "stale-anwer-client-timeout 0" lookup, preferring stale data over a lookup. In this case if there was no stale data, or the data was not a positive answer, retry the lookup with the stale options cleared, a.k.a. a normal lookup. Otherwise, continue with the lookup (query_gotanswer()) and refresh the stale RRset. This will trigger a response to the client, but will not detach the handle because a fetch will be created to refresh the RRset. (cherry picked from commit `92f7a67892`)	2021-04-02 13:27:52 +02:00
Matthijs Mekking	fcf8fb4f39	Keep track of allow client detach The stale-answer-client-timeout feature introduced a dependancy on when a client may be detached from the handle. The dboption DNS_DBFIND_STALEONLY was reused to track this attribute. This overloads the meaning of this database option, and actually introduced a bug because the option was checked in other places. In particular, in 'ns_query_done()' there is a check for 'RECURSING(qctx->client) && (!QUERY_STALEONLY(&qctx->client->query) \|\| ...' and the condition is satisfied because recursion has not completed yet and DNS_DBFIND_STALEONLY is already cleared by that time (in query_lookup()), because we found a useful answer and we should detach the client from the handle after sending the response. Add a new boolean to the client structure to keep track of client detach from handle is allowed or not. It is only disallowed if we are in a staleonly lookup and we didn't found a useful answer. (cherry picked from commit `fee164243f`)	2021-04-02 13:27:43 +02:00
Matthijs Mekking	96953fc293	Fix servestale fetchlimits crash When we query the resolver for a domain name that is in the same zone for which is already one or more fetches outstanding, we could potentially hit the fetch limits. If so, recursion fails immediately for the incoming query and if serve-stale is enabled, we may try to return a stale answer. If the resolver is also is authoritative for the parent zone (for example the root zone), first a delegation is found, but we first check the cache for a better response. Nothing is found in the cache, so we try to recurse to find the answer to the query. Because of fetch-limits 'dns_resolver_createfetch()' returns an error, which 'ns_query_recurse()' propagates to the caller, 'query_delegation_recurse()'. Because serve-stale is enabled, 'query_usestale()' is called, setting 'qctx->db' to the cache db, but leaving 'qctx->version' untouched. Now 'query_lookup()' is called to search for stale data in the cache database with a non-NULL 'qctx->version' (which is set to a zone db version), and thus we hit an assertion in rbtdb. This crash was introduced in 'v9_16' by commit `2afaff75ed`. (cherry picked from commit `87591de6f7`)	2021-03-11 13:47:20 +01:00
Ondřej Surý	f92b77ff0d	Change the isc_thread_self() return type to uintptr_t The pthread_self(), thrd_current() or GetCurrentThreadId() could actually be a pointer, so we should rather convert the value into uintptr_t instead of unsigned long. (cherry picked from commit `a0181056a8`)	2021-02-26 21:14:17 +01:00
Matthijs Mekking	acc95d4e1d	Don't servfail on staleonly lookups When a staleonly lookup doesn't find a satisfying answer, it should not try to respond to the client. This is not true when the initial lookup is staleonly (that is when 'stale-answer-client-timeout' is set to 0), because no resolver fetch has been created at this point. In this case continue with the lookup normally. (cherry picked from commit `f8b7b597e9`)	2021-02-25 12:07:34 +01:00
Matthijs Mekking	84deb57bc3	Don't allow recursion on staleonly lookups Fix a crash that can happen in the following scenario: A client request is received. There is no data for it in the cache, (not even stale data). A resolver fetch is created as part of recursion. Some time later, the fetch still hasn't completed, and stale-answer-client-timeout is triggered. A staleonly lookup is started. It will also find no data in the cache. So 'query_lookup()' will call 'query_gotanswer()' with ISC_R_NOTFOUND, so this will call 'query_notfound()' and this will start recursion. We will eventually end up in 'ns_query_recurse()' and that requires the client query fetch to be NULL: REQUIRE(client->query.fetch == NULL); If the previously started fetch is still running this assertion fails. The crash is easily prevented by not requiring recursion for staleonly lookups. Also remove a redundant setting of the staleonly flag at the end of 'query_lookup_staleonly()' before destroying the query context. Add a system test to catch this case. (cherry picked from commit `9e061faaae`)	2021-02-25 12:07:27 +01:00
Matthijs Mekking	2afaff75ed	Use stale on error also when unable to recurse The 'query_usestale()' function was only called when in 'query_gotanswer()' and an unexpected error occurred. This may have been "quota reached", and thus we were in some cases returning stale data on fetch-limits (and if serve-stale enabled of course). But we can also hit fetch-limits when recursing because we are following a referral (in 'query_notfound()' and 'query_delegation_recurse()'). Here we should also check for using stale data in case an error occurred. Specifically don't check for using stale data when refetching a zero TTL RRset from cache. Move the setting of DNS_DBFIND_STALESTART into the 'query_usestale()' function to avoid code duplication. (cherry picked from commit `8bcd7fe69e`)	2021-02-08 16:10:03 +01:00
Matthijs Mekking	dbf5428629	Only start stale refresh window when resuming If we did not attempt a fetch due to fetch-limits, we should not start the stale-refresh-time window. Introduce a new flag DNS_DBFIND_STALESTART to differentiate between a resolver failure and unexpected error. If we are resuming, this indicates a resolver failure, then start the stale-refresh-time window, otherwise don't start the stale-refresh-time window, but still fall back to using stale data. (This commit also wraps some docstrings to 80 characters width) (cherry picked from commit `aabdedeae3`)	2021-02-08 16:07:43 +01:00

1 2 3 4 5

203 commits