bind9

mirror of https://github.com/isc-projects/bind9.git synced 2026-02-27 03:51:16 -05:00

Author	SHA1	Message	Date
Michał Kępień	fec4664ab0	Set "max-cache-size" in the "geoip2" system test The named configuration files used in the "geoip2" system test cause a rather large number of views (6-8) to be set up in each tested named instance. Each view has its own cache. Commit `aa72c31422` caused the RBT hash table to be pre-allocated to a size derived from "max-cache-size", so that it never needs to be rehashed. The size of that hash table is not expected to be significant enough to cause memory use issues in typical conditions even for large "max-cache-size" settings. However, these two factors combined can cause memory exhaustion issues in GitLab CI, where we run multiple "instances" of the test suite in parallel on the same runner, each test suite executes multiple system tests concurrently, and each system test may potentially start multiple named instances at the same time. In practice, this problem currently only seems to be affecting the "geoip2" system test, which is failing intermittently due to named instances used by that test getting killed by oom-killer. Prevent the "geoip2" system test from failing intermittently by setting "max-cache-size" in named configuration files used in that test to a low value in order to keep memory usage at bay even with a large number of views configured. (cherry picked from commit `4292d5bdfe`)	2020-08-05 11:08:24 +02:00
Matthijs Mekking	f3103660d0	keyword 'primaries' is unknown in 9.16 In 9.17 we introduced 'primaries' as a synonym for 'masters' in the configuration file. This synonym has not been backported so change the serve-stale test to make use of the 'masters' keyword.	2020-08-05 09:09:16 +02:00
Matthijs Mekking	c92de6cb44	stale-cache-enable is enabled by default Because this is a backport, the option should default to keep the serve-stale caching enabled.	2020-08-05 09:09:16 +02:00
Ondřej Surý	f3a7ee87ef	Add CHANGES and release notes for GL #1712 and GL #1829 (cherry picked from commit `dd62275152`)	2020-08-05 09:09:16 +02:00
Ondřej Surý	c4e6ade0e5	Add tests with stale-cache-disabled into serve-stale system test Add a fifth named (ns5) that runs with `stale-cache-enable no;` and check that there are no stale records in the cache. (cherry picked from commit `abc2ab9223`)	2020-08-05 09:09:16 +02:00
Ondřej Surý	b48e9ab201	Add stale-cache-enable option and disable serve-stable by default The current serve-stale implementation in BIND 9 stores all received records in the cache for a max-stale-ttl interval (default 12 hours). This allows DNS operators to turn the serve-stale answers in an event of large authoritative DNS outage. The caching of the stale answers needs to be enabled before the outage happens or the feature would be otherwise useless. The negative consequence of the default setting is the inevitable cache-bloat that happens for every and each DNS operator running named. In this MR, a new configuration option `stale-cache-enable` is introduced that allows the operators to selectively enable or disable the serve-stale feature of BIND 9 based on their decision. The newly introduced option has been disabled by default, e.g. serve-stale is disabled in the default configuration and has to be enabled if required. (cherry picked from commit `ce53db34d6`)	2020-08-05 09:09:16 +02:00
Mark Andrews	20bc6aefff	Check rcode is FORMERR (cherry picked from commit `88ff6b846c`)	2020-08-04 23:04:34 +10:00
Michał Kępień	e734651fbd	Only run system tests as root in developer mode Running system tests with root privileges is potentially dangerous. Only allow it when explicitly requested (by building with --enable-developer). (cherry picked from commit `3ef106f69d`)	2020-07-31 07:46:27 +02:00
Mark Andrews	14fe6e77a7	Always check the return from isc_refcount_decrement. Created isc_refcount_decrement_expect macro to test conditionally the return value to ensure it is in expected range. Converted unchecked isc_refcount_decrement to use isc_refcount_decrement_expect. Converted INSIST(isc_refcount_decrement()...) to isc_refcount_decrement_expect. (cherry picked from commit `bde5c7632a`)	2020-07-31 12:54:47 +10:00
Michal Nowak	0f319908f0	Remove cross-test dependency on ckdnsrps.sh	2020-07-30 16:25:23 +02:00
Michal Nowak	72a6b0dc6f	Fix name of the test directory of stop.pl in masterformat test	2020-07-30 16:24:18 +02:00
Michal Nowak	24f5f68d7a	Ensure test fails if packet.pl does not work as expected	2020-07-30 16:20:46 +02:00
Ondřej Surý	aa72c31422	Fix the rbt hashtable and grow it when setting max-cache-size There were several problems with rbt hashtable implementation: 1. Our internal hashing function returns uint64_t value, but it was silently truncated to unsigned int in dns_name_hash() and dns_name_fullhash() functions. As the SipHash 2-4 higher bits are more random, we need to use the upper half of the return value. 2. The hashtable implementation in rbt.c was using modulo to pick the slot number for the hash table. This has several problems because modulo is: a) slow, b) oblivious to patterns in the input data. This could lead to very uneven distribution of the hashed data in the hashtable. Combined with the single-linked lists we use, it could really hog-down the lookup and removal of the nodes from the rbt tree[a]. The Fibonacci Hashing is much better fit for the hashtable function here. For longer description, read "Fibonacci Hashing: The Optimization that the World Forgot"[b] or just look at the Linux kernel. Also this will make Diego very happy :). 3. The hashtable would rehash every time the number of nodes in the rbt tree would exceed 3 * (hashtable size). The overcommit will make the uneven distribution in the hashtable even worse, but the main problem lies in the rehashing - every time the database grows beyond the limit, each subsequent rehashing will be much slower. The mitigation here is letting the rbt know how big the cache can grown and pre-allocate the hashtable to be big enough to actually never need to rehash. This will consume more memory at the start, but since the size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it will only consume maximum of 32GB of memory for hashtable in the worst case (and max-cache-size would need to be set to more than 4TB). Calling the dns_db_adjusthashsize() will also cap the maximum size of the hashtable to the pre-computed number of bits, so it won't try to consume more gigabytes of memory than available for the database. FIXME: What is the average size of the rbt node that gets hashed? I chose the pagesize (4k) as initial value to precompute the size of the hashtable, but the value is based on feeling and not any real data. For future work, there are more places where we use result of the hash value modulo some small number and that would benefit from Fibonacci Hashing to get better distribution. Notes: a. A doubly linked list should be used here to speedup the removal of the entries from the hashtable. b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/ (cherry picked from commit `e24bc324b4`)	2020-07-30 11:57:24 +02:00
Evan Hunt	bbc739b09b	report libuv version string in `named -V` (cherry picked from commit `1036338a10`)	2020-07-27 19:55:22 -07:00
Diego Fronza	1a101f223c	Add test for RPZ wildcard passthru ignored fix	2020-07-27 17:17:02 -03:00
Mark Andrews	b0942c2442	Check walking the hip rendezvous servers. Also fixes extraneous white space at end of record when there are no rendezvous servers. (cherry picked from commit `78db46d746`)	2020-07-24 15:24:49 +10:00
Petr Menšík	fade143531	Prevent crash on dst initialization failure server might be created, but not yet fully initialized, when fatal function is called. Check both server and task before attaching exclusive task. (cherry picked from commit `c5e7152cf0`)	2020-07-23 11:28:11 +10:00
Michal Nowak	9509af7008	Check tests for core files regardless of test status Failed test should be checked for core files et al. and have backtrace generated.	2020-07-20 13:09:06 +02:00
Michal Nowak	ace988990a	Rationalize backtrace logging GDB backtrace generated via "thread apply all bt full" is too long for standard output, lets save them to .txt file among other log files.	2020-07-20 12:48:29 +02:00
Michal Nowak	c2bbe11349	Fold stop_servers_failed() to stop_servers()	2020-07-20 12:48:11 +02:00
Mark Andrews	90154d203b	Add regression test for [GL !3735 ] Check that resign interval is actually in days rather than hours by checking that RRSIGs are all within the allowed day range. (cherry picked from commit `11ecf7901b`)	2020-07-14 12:11:42 +10:00
Tony Finch	31005d61ae	Fix re-signing when `sig-validity-interval` has two arguments Since October 2019 I have had complaints from `dnssec-cds` reporting that the signatures on some of my test zones had expired. These were zones signed by BIND 9.15 or 9.17, with a DNSKEY TTL of 24h and `sig-validity-interval 10 8`. This is the same setup we have used for our production zones since 2015, which is intended to re-sign the zones every 2 days, keeping at least 8 days signature validity. The SOA expire interval is 7 days, so even in the presence of zone transfer problems, no-one should ever see expired signatures. (These timers are a bit too tight to be completely correct, because I should have increased the expiry timers when I increased the DNSKEY TTLs from 1h to 24h. But that should only matter when zone transfers are broken, which was not the case for the error reports that led to this patch.) For example, this morning my test zone contained: dev.dns.cam.ac.uk. 86400 IN RRSIG DNSKEY 13 5 86400 ( 20200701221418 20200621213022 ...) But one of my resolvers had cached: dev.dns.cam.ac.uk. 21424 IN RRSIG DNSKEY 13 5 86400 ( 20200622063022 20200612061136 ...) This TTL was captured at 20200622105807 so the resolver cached the RRset 64976 seconds previously (18h02m56s), at 20200621165511 only about 12h before expiry. The other symptom of this error was incorrect `resign` times in the output from `rndc zonestatus`. For example, I have configured a test zone zone fast.dotat.at { file "../u/z/fast.dotat.at"; type primary; auto-dnssec maintain; sig-validity-interval 500 499; }; The zone is reset to a minimal zone containing only SOA and NS records, and when `named` starts it loads and signs the zone. After that, `rndc zonestatus` reports: next resign node: fast.dotat.at/NS next resign time: Fri, 28 May 2021 12:48:47 GMT The resign time should be within the next 24h, but instead it is near the signature expiry time, which the RRSIG(NS) says is 20210618074847. (Note 499 hours is a bit more than 20 days.) May/June 2021 is less than 500 days from now because expiry time jitter is applied to the NS records. Using this test I bisected this bug to `09990672d` which contained a mistake leading to the resigning interval always being calculated in hours, when days are expected. This bug only occurs for configurations that use the two-argument form of `sig-validity-interval`. (cherry picked from commit `030674b2a3`)	2020-07-14 12:11:42 +10:00
Evan Hunt	fc73dbdc7d	make sure new_zone_lock is locked before unlocking it it was possible for the count_newzones() function to try to unlock view->new_zone_lock on return before locking it, which caused a crash on shutdown. (cherry picked from commit `ed37c63e2b`)	2020-07-13 23:53:14 +00:00
Mark Andrews	0265bd17d5	Fallback to built in trust-anchors, managed-keys, or trusted-keys if the bind.keys file cannot be parsed. (cherry picked from commit `d02a14c795`)	2020-07-13 15:13:50 +10:00
Mark Andrews	7e62d76b6b	Don't verify the zone when setting expire to "now+1s" as it can fail as too much wall clock time may have elapsed. Also capture signzone output for forensic analysis (cherry picked from commit `a0e8a11cc6`)	2020-07-13 12:42:46 +10:00
Mark Andrews	86464e6e4b	Remove redundant check for listener being non-NULL (cherry picked from commit `c91dc92410`)	2020-07-13 10:28:34 +10:00
Michał Kępień	0bc4d6cc7a	Fix locking for LMDB 0.9.26 When "rndc reconfig" is run, named first configures a fresh set of views and then tears down the old views. Consider what happens for a single view with LMDB enabled; "envA" is the pointer to the LMDB environment used by the original/old version of the view, "envB" is the pointer to the same LMDB environment used by the new version of that view: 1. mdb_env_open(envA) is called when the view is first created. 2. "rndc reconfig" is called. 3. mdb_env_open(envB) is called for the new instance of the view. 4. mdb_env_close(envA) is called for the old instance of the view. This seems to have worked so far. However, an upstream change [1] in LMDB which will be part of its 0.9.26 release prevents the above sequence of calls from working as intended because the locktable mutexes will now get destroyed by the mdb_env_close() call in step 4 above, causing any subsequent mdb_txn_begin() calls to fail (because all of the above steps are happening within a single named process). Preventing the above scenario from happening would require either redesigning the way we use LMDB in BIND, which is not something we can easily backport, or redesigning the way BIND carries out its reconfiguration process, which would be an even more severe change. To work around the problem, set MDB_NOLOCK when calling mdb_env_open() to stop LMDB from controlling concurrent access to the database and do the necessary locking in named instead. Reuse the view->new_zone_lock mutex for this purpose to prevent the need for modifying struct dns_view (which would necessitate library API version bumps). Drop use of MDB_NOTLS as it is made redundant by MDB_NOLOCK: MDB_NOTLS only affects where LMDB reader locktable slots are stored while MDB_NOLOCK prevents the reader locktable from being used altogether. [1] `2fd44e3251` (cherry picked from commit `53120279b5`)	2020-07-10 11:30:31 +02:00
Matthijs Mekking	293d52341d	Increase "rndc dnssec -status" output size BUFSIZ (512 bytes on Windows) may not be enough to fit the status of a DNSSEC policy and three DNSSEC keys. Set the size of the relevant buffer to a hardcoded value of 4096 bytes, which should be enough for most scenarios. (cherry picked from commit `9347e7db7e`)	2020-07-03 15:13:50 +02:00
Ondřej Surý	b9b1366bf0	Add prereq.sh script to the shutdown system test The shutdown test requires python, pytest and dnspython.	2020-07-03 08:54:01 +02:00
Matthijs Mekking	de02eb55b5	Fix kasp test set_keytime While the creation and publication times of the various keys in this policy are nearly at the same time there is a chance that one key is created a second later than the other. The `set_keytimes_algorithm_policy` mistakenly set the keytimes for KEY3 based of the "published" time from KEY2. (cherry picked from commit `24e07ae98e`)	2020-07-02 04:56:20 +00:00
Suzanne Goldlust	4112b96d52	Fix formatting of See Also section header (cherry picked from commit `e3e787bc14`)	2020-07-01 23:46:39 +02:00
Diego Fronza	004849fd36	Added test for the fix This test ensures that named will correctly shutdown when receiving multiple control connections after processing of either "rncd stop" or "kill -SIGTERM" commands. Before the fix, named was crashing due to a race condition happening between two threads, one running shutdown logic in named/server.c and other handling control logic in controlconf.c. This test tries to reproduce the above scenario by issuing multiple queries to a target named instance, issuing either rndc stop or kill -SIGTERM command to the same named instance, then starting multiple rndc status connections to ensure it is not crashing anymore. (cherry picked from commit `042e509753`)	2020-07-01 12:52:51 +02:00
Ondřej Surý	7c0fb5e492	Don't continue opening a new rndc connection if we are shutting down Due to lack of synchronization, whenever named was being requested to stop using rndc, controlconf.c module could be trying to access an already released pointer through named_g_server->interfacemgr in a separate thread. The race could only be triggered if named was being shutdown and more rndc connections were ocurring at the same time. This fix correctly checks if the server is shutting down before opening a new rndc connection. (cherry picked from commit `be6cc53ec2`)	2020-07-01 12:52:51 +02:00
Matthijs Mekking	f1b3686cd2	Output rndc dnssec -status Implement the 'rndc dnssec -status' command that will output some information about the key states, such as which policy is used for the zone, what keys are in use, and when rollover is scheduled. Add loose testing in the kasp system test, the actual times are already tested via key file inspection. (cherry picked from commit `19ce9ec1d4`)	2020-07-01 09:57:44 +02:00
Matthijs Mekking	34a9c3f6c9	Implement dummy 'rndc dnssec -status' command Add the code and documentation required to provide DNSSEC signing status through rndc. This does not yet show any useful information, just provide the command that will output some dummy string. (cherry picked from commit `e1ba1bea7c`)	2020-07-01 09:57:44 +02:00
Evan Hunt	6b00e5f5a0	update the acl system test to include a blackhole test case this ACL was previously untested, which allowed a regression to go undetected. (cherry picked from commit `e3ee138098`)	2020-06-30 19:41:42 -07:00
Michał Kępień	62f631f798	Silence PyYAML warning Make yaml.load_all() use yaml.SafeLoader to address a warning currently emitted when bin/tests/system/dnstap/ydump.py is run: ydump.py:28: YAMLLoadWarning: calling yaml.load_all() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. for l in yaml.load_all(f.stdout):	2020-06-30 11:19:47 +10:00
Mark Andrews	089df5f8ef	Add TOP_SRCDIR to conf.sh.in and conf.sh.win32	2020-06-29 23:50:45 +00:00
Mark Andrews	bf205b00c3	Check that 'rndc dnstap -roll <value>' works (cherry picked from commit `a289a57c7f`)	2020-06-29 22:30:01 +00:00
Ondřej Surý	9d876eccd3	Fix miscellaneous little bugs in RST formatting (cherry picked from commit `b51d10608e`)	2020-06-29 19:41:52 +02:00
Ondřej Surý	7196a64fdf	Add missing rndc.conf header that was breaking manpages section The rndc.conf main header was missing the header markup and that was breaking the TOC for all manpages in the ARM because sphinx-build incorrectly remembered the markup for subheader to be ~~~~ instead of ----. (cherry picked from commit `5c56a0ddbc`)	2020-06-29 19:41:15 +02:00
Matthijs Mekking	7eed00502f	kasp tests: fix wait for reconfig done The wait until zones are signed after rndc reconfig is broken because the zones are already signed before the reconfig. Fix by having a different way to ensure the signing of the zone is complete. This does require a call to the "wait_for_done_signing" function after each "check_keys" call after the ns6 reconfig. The "wait_for_done_signing" looks for a (newly added) debug log message that named will output if it is done signing with a certain key. (cherry picked from commit `a47192ed5b`)	2020-06-29 08:09:40 +02:00
Matthijs Mekking	60752f8092	kasp tests: Replace while loops with retry_quiet (cherry picked from commit `cf76d839ae`)	2020-06-29 08:09:32 +02:00
Evan Hunt	f171017570	append "0" to IPv6 addresses ending in "::" when printing YAML such addresses broke some YAML parsers. (cherry picked from commit `a8baf79e33`)	2020-06-25 18:57:06 -07:00
Matthijs Mekking	6d91799388	Add todo in dnssec system test for [GL #1689 ] Add a note why we don't have a test case for the issue. It is tricky to write a good test case for this if our tools are not allowed to create signatures for unsupported algorithms. (cherry picked from commit `c6345fffe9`)	2020-06-25 22:44:19 +10:00
Mark Andrews	4885f0813e	Resize unamebuf[] to avoid warnings about snprintf() not having enough buffer space. Also change named_os_uname() prototype so that it is now returning (const char ) rather than (char ). If uname() is not supported on a UNIX build prepopulate unamebuf[] with "unknown architecture". (cherry picked from commit `4bc3de070f`)	2020-06-25 09:26:22 +10:00
Evan Hunt	dca3658720	"check-names primary" and "check-names secondary" were ignored these keywords were added to the parser as synonyms for "master" and "slave" but were never hooked in to the configuration of named, so they were ignored. this has been fixed and the option is now checked for correctness. (cherry picked from commit `ba31b189b4`)	2020-06-22 14:30:14 +02:00
Mark Andrews	e522e80dc4	Add checking RFC 4592 responses examples to wildcard system test (cherry picked from commit 30586aa054c9cd8a4e64c91ed78683a4b54c79bc)	2020-06-18 10:18:42 +02:00
Mark Andrews	70c27df941	The dsset returned by dns_keynode_dsset needs to be thread safe. - clone keynode->dsset rather than return a pointer so that thread use is independent of each other. - hold a reference to the dsset (keynode) so it can't be deleted while in use. - create a new keynode when removing DS records so that dangling pointers to the deleted records will not occur. - use a rwlock when accessing the rdatalist to prevent instabilities when DS records are added. (cherry picked from commit `e5b2eca1d3`)	2020-06-11 16:09:43 +10:00
Mark Andrews	0c23582ffd	Improve the behaviour of yamlget.py when run with python2 (cherry picked from commit `9e72266705`)	2020-06-05 10:51:01 +10:00

1 2 3 4 5 ...

9740 commits