The dotat() function has been changed to send the TAT
query asynchronously, so there's no lock order loop
because we initialize the data first and then we schedule
the TAT send to happen asynchronously.
This breaks following lock-order loops:
zone->lock (dns_zone_setviewcommit) while holding view->lock
(dns_view_setviewcommit)
keytable->lock (dns_keytable_find) while holding zone->lock
(zone_asyncload)
view->lock (dns_view_findzonecut) while holding keytable->lock
(dns_keytable_forall)
(cherry picked from commit 3c4b68af7c)
Each dns_rpz_zone_t structure keeps a hash table of the names this RPZ
database contains. Here is what happens when an RPZ is updated:
- a new hash table is prepared for the new version of the RPZ by
iterating over it; each name found is added to the summary RPZ
database,
- every name added to the new hash table is searched for in the old
hash table; if found, it is removed from the old hash table,
- the old hash table is iterated over; all names found in it are
removed from the summary RPZ database (because at that point the old
hash table should only contain names which are not present in the
new version of the RPZ),
- the new hash table replaces the old hash table.
When the new version of the RPZ is iterated over, if a given name is
spelled using a different letter case than in the old version of the
RPZ, the new variant will hash to a different value than the old
variant, which means it will not be removed from the old hash table.
When the old hash table is subsequently iterated over to remove
seemingly deleted names, the old variant of the name will still be
there, causing the name to be deleted from the summary RPZ database
(which effectively causes a given rule to be ignored).
The issue can be triggered not just by altering the case of existing
names in an RPZ, but also by adding sibling names spelled with a
different letter case. This is because RBT code preserves case when
node splitting occurs. The end result is that when the RPZ is iterated
over, a given name may be using a different case than in the zone file
(or XFR contents).
Fix by downcasing all names found in the RPZ database before adding them
to the summary RPZ database.
(cherry picked from commit dc8a7791bd)
some versions of perl failed to run packet.pl because the 'last'
keyword can't be used outside of a loop block. this commit changes
the packet dumping code to a function so we can use 'return' instead.
(cherry picked from commit bf9aee1b88)
the tcp system test uses the 'packet.pl' test tool to send a packet
thousands of times. this took a long time because the tool was waiting
for replies and parsing them; however, for that particular test the
replies aren't relevant.
this commit uses non-blocking sockets and moves the reply parsing
outside the send loop, which speeds the system test up substantially.
(cherry picked from commit 1ceea908b6)
The "huge.zone" zone can take longer than 100 seconds to load when
running under a sanitizer. Increase the relevant zone load timeout to
prevent intermittent failures of the "rndc" system test.
(cherry picked from commit fd08918df5)
The test works as follows:
1. Client wants to resolve unusual ip6.arpa. name:
test1.test2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.9.0.9.4.1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa. IN TXT
2. Query is sent to ns7, a qmin enabled resolver.
3. ns7 do the first stage in query minimization for the name and send a new
query to root (ns1):
_.1.0.0.2.ip6.arpa. IN A
4. ns1 delegates ip6.arpa. to ns2.good.:
;; AUTHORITY SECTION:
;ip6.arpa. 20 IN NS ns2.good.
;; ADDITIONAL SECTION:
;ns2.good. 20 IN A 10.53.0.2
5. ns7 do a second round in minimizing the name and send a new query
to ns2.good. (10.53.0.2):
_.8.2.6.0.1.0.0.2.ip6.arpa. IN A
6. ans2 delegates 8.2.6.0.1.0.0.2.ip6.arpa. to ns3.good.:
;; AUTHORITY SECTION:
;8.2.6.0.1.0.0.2.ip6.arpa. 60 IN NS ns3.good.
;; ADDITIONAL SECTION:
;ns3.good. 60 IN A 10.53.0.3
7. ns7 do a third round in minimizing the name and send a new query to
ns3.good.:
_.1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa. IN A
8. ans3 delegates 1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa. to ns4.good.:
;; AUTHORITY SECTION:
;1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa. 60 IN NS ns4.good.
;; ADDITIONAL SECTION:
;ns4.good. 60 IN A 10.53.0.4
9. ns7 do fourth round in minimizing the name and send a new query to
ns4.good.:
_.9.4.1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa. IN A
10. ns4.good. doesn't know such name, but answers stating it is authoritative for
the domai:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 53815
...
;; AUTHORITY SECTION:
1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa. 60 IN SOA ns4.good. ...
11. ns7 do another minimization on name:
_.9.0.9.4.1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa
sends to ns4.good. and gets the same SOA response stated in item #10
12. ns7 do another minimization on name:
_.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.9.0.9.4.1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa
sends to ns4.good. and gets the same SOA response stated in item #10.
13. ns7 do the last query minimization name for the ip6.arpa. QNAME.
After all IPv6 labels are exausted the algorithm falls back to the
original QNAME:
test1.test2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.9.0.9.4.1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa
ns7 sends a new query with the original QNAME to ans4.
14. Finally ans4 answers with the expected response:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40969
;; flags: qr aa; QUESTION: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 8192
;; QUESTION SECTION:
;test1.test2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.9.0.9.4.1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa. IN TXT
;; ANSWER SECTION:
;test1.test2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.9.0.9.4.1.1.1.1.8.2.6.0.1.0.0.2.ip6.arpa. 1 IN TXT "long_ip6_name"
(cherry picked from commit 11add69198)
Log when named decides to add a CDS/CDNSKEY record to the zone. Now
you understand how the bug was found that was fixed in the previous
commits.
(cherry picked from commit f9ef5120c1)
The CDS/CDNSKEY record will be published when the DS is in the
rumoured state. However, with the introduction of the rndc '-checkds'
command, the logic in the keymgr was changed to prevent the DS
state to go in RUMOURED unless the specific command was given. Hence,
the CDS was never published before it was seen in the parent.
Initially I thought this was a policy approval rule, however it is
actually a DNSSEC timing rule. Remove the restriction from
'keymgr_policy_approval' and update the 'keymgr_transition_time'
function. When looking to move the DS state to OMNIPRESENT it will
no longer calculate the state from its last change, but from when
the DS was seen in the parent, "DS Publish". If the time was not set,
default to next key event of an hour.
Similarly for moving the DS state to HIDDEN, the time to wait will
be derived from the "DS Delete" time, not from when the DS state
last changed.
(cherry picked from commit c8205bfa0e)
The 'rndc_checkds' utility now allows "now" as the time when the DS
has been seen in/seen removed from the parent.
Also it uses "KEYX" as the key argument, rather than key id.
The 'rndc_checkds' will retrieve the key from the "KEYX" string. This
makes the call a bit more readable.
(cherry picked from commit dd754a974c)
This commit has a lot of updates on comments, mainly to make the
system test more readable.
Also remove some redundant signing policy checks (check_keys,
check_dnssecstatus, check_keytimes).
Finally, move key time checks and expected key time settings above
'rndc_checkds' calls (with the new way of testing next key event
times there is no need to do them after 'rndc_checkds', and moving
them above 'rndc_checkds' makes the flow of testing easier to follow.
(cherry picked from commit 8cb394e047)
Add the new '-P ds' and '-D ds' calls to the kasp test setup so that
next key event times can reliably be tested.
(cherry picked from commit 4a67cdabfe)
Add two more arguments to the dnssec-settime tool. '-P ds' sets the
time that the DS was published in the parent, '-D ds' sets the time
that the DS was removed from the parent (these times are not accurate,
but rely on the user to use them appropriately, and as long as the
time is not before actual publication/withdrawal, it is fine).
These new arguments are needed for the kasp system test. We want to
test when the next key event is once a DS is published, and now
that 'parent-registration-delay' is obsoleted, we need a different
approach to reliable test the timings.
(cherry picked from commit d4c4f6a669)
The test for assertion failure via large TCP packet needs to be repeated
multiple times (we use 300000). This commit fixes the input file to be
properly hexlified and uses the new packet.pl -r feature to send it
300000 times via TCP.
(cherry picked from commit 5f6eb014aa)
For some tests, we need to send big data streams (for TCP) or repeated
packets (for UDP), this commits adds `-r` option to packet.pl that sends
the same input <repeats> times using the specified protocol.
(cherry picked from commit dd46559a19)
In order to lower the amount of memory allocated at startup by named
instances used in the BIND system test suite, set the default value of
"max-cache-size" for these to 2 megabytes. The purpose of this change
is to prevent named instances (or even entire virtual machines) from
getting killed by the operating system on the test host due to excessive
memory use.
Remove all "max-cache-size" statements from named configuration files
used in system tests ("checkconf" notwithstanding) to prevent confusion
as the "-T maxcachesize=..." command line option takes precedence over
configuration files.
(cherry picked from commit dad6572093)
An implicit default of "max-cache-size 90%;" may cause memory use issues
on hosts which run numerous named instances in parallel (e.g. GitLab CI
runners) due to the cache RBT hash table now being pre-allocated [1] at
startup. Add a new command line option, "-T maxcachesize=...", to allow
the default value of "max-cache-size" to be overridden at runtime. When
this new option is in effect, it overrides any other "max-cache-size"
setting in the configuration, either implicit or explicit. This
approach was chosen because it is arguably the simplest one to
implement.
The following alternative approaches to solving this problem were
considered and ultimately rejected (after it was decided they were not
worth the extra code complexity):
- adding the same command line option, but making explicit
configuration statements have priority over it,
- adding a build-time option that allows the implicit default of
"max-cache-size 90%;" to be overridden.
[1] see commit aa72c31422
(cherry picked from commit 9ac1f6a9bc)
Prevent intermittent false positives on slow platforms by subtracting
the number of seconds which passed between key creation and invoking
'rndc dnssec -checkds'.
This particularly fails for the step3.csk-roll2.autosign zone because
the closest next key event is when the zone signatures become
omnipresent. Running 'rndc dnssec -checkds' some time later means
that the next key event is in fact closer than the calculated time
and thus we need to adjust the expected time by the time already
passed.
(cherry picked from commit 262b52a154)
Previously .txt files with full backtrace may be identified as a
crashed test:
I:Core dumps were found for the following system tests:
I: core.19948-backtrace.txt
I: shutdown
Now .txt files are removed from the list.
Change 'run.sh.in' to match the core matching pattern in
'testsummary.sh'.
(cherry picked from commit c2dcd95966)
In the rare case that you have multiple keys acting as KSK and that
have the same keytag, you can now set the algorithm when calling
'-checkds'.
(cherry picked from commit 46fcd927e7)
Make sure the 'checkds' command correctly sets the right key timing
metadata and also make sure that it rejects setting the key timing
metadata if there are multiple keys with the KSK role and no key
identifier is provided.
(cherry picked from commit a43bb41909)
With 'checkds' replacing 'parent-registration-delay', the kasp
test needs the expected times to be adjusted. Also the system test
needs to call 'rndc dnssec -checkds' to progress the rollovers.
Since we pretend that the KSK is active as soon as the DS is
submitted (and parent registration delay is no longer applicable)
we can simplify the 'csk_rollover_predecessor_keytimes' function
to take only one "addtime" parameter.
This commit also slightly changes the 'check_dnssecstatus' function,
passing the zone as a parameter.
(cherry picked from commit 38cb43bc86)
Add a new 'rndc' command 'dnssec -checkds' that allows the user to
signal named that a new DS record has been seen published in the
parent, or that an existing DS record has been withdrawn from the
parent.
Upon the 'checkds' request, 'named' will write out the new state for
the key, updating the 'DSPublish' or 'DSRemoved' timing metadata.
This replaces the "parent-registration-delay" configuration option,
this was unreliable because it was purely time based (if the user
did not actually submit the new DS to the parent for example, this
could result in an invalid DNSSEC state).
Because we cannot rely on the parent registration delay for state
transition, we need to replace it with a different guard. Instead,
if a key wants its DS state to be moved to RUMOURED, the "DSPublish"
time must be set and must not be in the future. If a key wants its
DS state to be moved to UNRETENTIVE, the "DSRemoved" time must be set
and must not be in the future.
By default, with '-checkds' you set the time that the DS has been
published or withdrawn to now, but you can set a different time with
'-when'. If there is only one KSK for the zone, that key has its
DS state moved to RUMOURED. If there are multiple keys for the zone,
specify the right key with '-key'.
(cherry picked from commit 04d8fc0143)