The dns_glue struct currently contains four dns_rdataset structs to hold
the glue. These structs are over 100 bytes each because they need to be
able to hold data for multiple types of databases.
Since the dns_glue_t type is only used by qpzone, we can instead hold
pointers to the vecheaders directly, and only bind the vecheaders to
the rdatasets when adding the glue to the message.
The dns_glue_t, dns_gluelist_t and dns_glue_additionaldata_ctx types are
only used in qpzone.c. This commits moves them to the private header
qpzone_p.h.
This is done in preparation of a followup commit that will refactor them
to use types that are private to qpzone.
When a validator is being shut down, the associated name
`val->name` is set to NULL. This could cause a crash if a worker
thread subsequently added an EDE code to the response containing
val->name in the extra text.
`validator_addede()` now checks whether the name is NULL before
trying to add it to the extra text.
The allow-transfer/allow-query catalog zone custom properties support
only APL RRtypes. All other types are correctly rejected by the
catz_process_apl() function. However, when an APL RRtype is processed
by that function, and another (non-APL) RRtype is then attempted to be
processed, there is an assertion failure happening in the prologue
of the function because `*aclbp != NULL` (i.e. an APL has been already
processed). Move the code to do type checking before the affected
REQUIRE assertion.
The DNS64 state information stored in client->query.dns64_aaaaok
could cause an assertion failure in query_respond() if the server
was configured in such a way as to trigger a new recursion before
the query had been reset - for example, by using the filter-aaaa
plugin, which may need to recurse to find out whether an A record
exists.
This has been addressed by clearing DNS64 state information
immediately after the call to query_filter64().
In previous_closest_nsec(), a new qpreader was opened to search the NSEC
tree. It was possible for that to be used to update a QP iterator object
owned by the caller, and then be destroyed when the function returned.
This qpreader object isn't necessary anymore; since namespaces were
added to the QP trie in commit 15653c54a0, we can now just reuse the
existing reader for the main tree.
Each dns__nta_t now references its parent ntatable in nta_create() and
releases it in dns__nta_destroy(). This avoids a use-after-free in
fetch_done() and other callbacks that dereference nta->ntatable: the
ntatable could otherwise be released by view destruction while an
in-flight resolver fetch still holds a reference to the NTA.
makeslab(), makevec(), dns_rdatavec_merge() and dns_rdatavec_subtract()
summed per-record storage into an unsigned int with no upper-bound
check. An RRset whose total encoded size exceeds DNS_RDATA_MAXLENGTH
cannot fit in a DNS message and is unservable; building its in-memory
representation only burns memory on data that will fail at response
time, and at the upper bound the running sum could in theory wrap.
Cap the running total at DNS_RDATA_MAXLENGTH and return ISC_R_NOSPACE
when exceeded. Update the qpdb cache memory-purge test to use a
record size that fits within the new limit.
Assisted-by: Claude:claude-opus-4-7
RFC 3445 also eliminated the DNS_KEYTYPE_NOAUTH, DNS_KEYTYPE_NOCONF,
and DNS_KEYOWNER_ENTITY flags. With NOAUTH and NOCONF gone, the
concept of NOKEY can no longer be expressed in KEY records.
DNS_KEYOWNER_ENTITY was already unused as of 22d688f656 but still
defined; that is now also removed.
The DNS_KEYFLAG_EXTENDED flag was only legitimate for type KEY
and was eliminated by RFC 3445. Dropping the extended-flags
handling in pub_compare() also fixes a possible crash when
signing a zone whose journal contains a crafted DNSKEY: a
6-byte record with the EXTENDED bit set produced a memmove()
length that underflowed and ran off a stack buffer.
The previous hash_key() was a deterministic, unkeyed (<<1) + add over the
key words. An off-path attacker could invert it offline and submit
queries whose source /24, qname hash, and qtype map to a single bucket;
under chaining this turns every lookup into an O(N) walk under
rrl->lock and starves legitimate query processing on the very feature
deployed to mitigate DoS.
Replace it with isc_hash32(), which is HalfSipHash-2-4 keyed by a
per-process random seed, so collision sets cannot be precomputed.
Assisted-by: Claude:claude-opus-4-7
Once the walk reaches the root, splitting one more label off would
trip an internal assertion and abort named. Stop cleanly with
ISC_R_NOTFOUND so the dispatcher cancels the fetch. Only reachable
through misconfiguration (root configured as a primary with parental
agents, or a parent zone that NODATAs its own NS).
Assisted-by: Claude:claude-opus-4-7
The dns_adbfind_t lifetime model has no reference counting; storage
liveness is held together by find->lock and the FIND_EVENT_SENT
idempotency flag, plus an unwritten cross-module rule that all
non-trivial operations on a find run on find->loop. If a caller
violates that rule, the unlock-relock window in dns_adb_cancelfind
(and similar paths) becomes a use-after-free and we crash later
inside libpthread on a corrupted mutex.
Add REQUIREs at dns_adb_cancelfind, dns_adb_destroyfind and
find_sendevent so a violation aborts at the offending call site
rather than silently freeing storage another loop is still touching.
Also poison find->magic with ~DNS_ADBFIND_MAGIC in free_adbfind so
DNS_ADBFIND_VALID catches reuse-after-free at the next public entry
point instead of letting the dangling pointer reach the mutex code.
Assisted-by: Claude:claude-opus-4-7
The wire-format RSA DNSKEY parser used the residual rdata length after
the exponent as the modulus length, with no positive lower bound. A
crafted DNSKEY whose declared exponent length consumed the whole buffer
produced n = 0; the BN_bin2bn(_, 0, _) returned a non-NULL BIGNUM, the
NULL-check passed, and dnssec-importkey -f wrote out a "valid" key with
no key material. RSASHA1 also bypassed the algorithm-specific lower
bound in opensslrsa_createctx (which only checks an upper bound for the
SHA1 algorithms), so the degenerate key reached the verify path with
whatever behaviour the linked OpenSSL exhibits for n = 0.
Add OPENSSLRSA_MIN_MODULUS_BITS = 512 (the lowest legitimate modulus
across the RSA DNSSEC algorithms per RFC 5702) and reject smaller
moduli at parse time in opensslrsa_fromdns, opensslrsa_parse, and
opensslrsa_fromlabel — the same three load paths where the existing
exponent upper-bound check lives.
Assisted-by: Claude:claude-opus-4-7
The wire-format RSA DNSKEY parser was the only key path with no upper
bound on the public exponent — opensslrsa_parse and opensslrsa_fromlabel
already cap at RSA_MAX_PUBEXP_BITS. An attacker-controlled DNSKEY could
therefore force a validator to compute s^e mod n with e up to ~|n| bits,
amplifying every verify by ~120x for typical 2048-bit moduli (OpenSSL
itself only caps the exponent for moduli above 3072 bits). Apply the
same bit-count cap to wire-format keys.
Assisted-by: Claude:claude-opus-4-7
isc__ratelimiter_tick() and isc_ratelimiter_shutdown() each pulled
events out of rl->pending into a function-local list, dropped the
mutex, and then iterated. ISC_LIST_APPEND leaves the link in the
LINKED state, so a concurrent isc_ratelimiter_dequeue() saw an
event as still queued, called ISC_LIST_UNLINK against rl->pending —
which patched the prev/next of the local list — and freed the
event before dispatch finished, producing either an INSIST in the
unlink macro or a use-after-free in the dispatch loop.
isc_async_run() is a non-blocking wfcq enqueue, so there is no
benefit to dropping the mutex around it. Unlink each event and
hand it to isc_async_run() while still holding rl->lock; the
existing ISC_LINK_LINKED check in dequeue then correctly
distinguishes "still queued and cancellable" from "already taken".
Assisted-by: Claude:claude-opus-4-7
For a query whose qname is the root, the labels==1 branch in
redirect2() called dns_name_copy(redirectname, view->redirectzone)
with arguments reversed, overwriting the view-global
nxdomain-redirect target with the empty redirectname rather than
copying the configured target into the per-query lookup name. After
the corruption, view->redirectzone names the root, so
dns_name_issubdomain() makes redirect2() short-circuit for every
subsequent query and the nxdomain-redirect feature stops working
until named is restarted.
Triggering this needs the resolver to receive an NXDOMAIN for the
root from upstream, which does not happen in normal DNS operation.
Swap the arguments to match the dns_name_copy(source, dest)
signature. Add a system test that issues a root query through the
nxdomain-redirect resolver and verifies the redirect feature still
works for a normal NXDOMAIN-producing query afterwards.
Assisted-by: Claude:claude-opus-4-7
hmac_generate() declared its on-stack nonce buffer as
unsigned char data[ISC_MAX_MD_SIZE], i.e. 64 bytes. That is the maximum
digest size, but the buffer is filled up to the algorithm's HMAC block
size, which is 128 bytes for SHA-384 and SHA-512. Asking rndc-confgen
for an HMAC-SHA-384 or HMAC-SHA-512 key with -b > 512 (the documented
range allows up to 1024) wrote past the end of the stack buffer; on
hardened builds this aborted with a stack-smash detector firing
instead of producing a key.
Use the existing ISC_MAX_BLOCK_SIZE (128) for the buffer so the full
1..1024 range advertised by -A hmac-sha{384,512} works as documented.
The matching key_rawsecret[64] in confgen's generate_key() is enlarged
the same way so the generated key fits when dumped to the buffer.
Add a system test that exercises rndc-confgen across the previously
overflowing keysizes; with -Db_sanitize=address it caught the abort
before the fix.
Assisted-by: Claude:claude-opus-4-7
The function existence-checked the target with stat() and then opened
the same path without O_NOFOLLOW, so a symlink at the target path
passed the regular-file test against the link's destination and the
open() that followed truncated and wrote through the link.
rndc-confgen -a is typically run as root and writes the keyfile under
a directory that service accounts may have write access to, so a stray
symlink there would silently redirect the truncate, fchown, and
overwrite to whatever file the link pointed at.
Switch the existence check to lstat() and use S_ISREG() so a symlink's
S_IFLNK mode is detected directly (a plain bitmask of S_IFREG matches
both, since S_IFLNK shares its high bit). Add O_NOFOLLOW to both
open() flag sets to close the lstat/open TOCTOU window. Hardening
against unexpected symlinks on intermediate path components is out of
scope.
Assisted-by: Claude:claude-opus-4-7
DNS_MASTER_NOINCLUDE was defined to suppress $INCLUDE processing, but
no caller ever set it, so the guarded code path was dead and the flag
gave the false impression that named-checkzone could be hardened
against untrusted input. The zone-file parser cannot safely read text
from a less-trusted source than the user running the tool: $INCLUDE
opens any local file readable by that user, and fragments of its
contents leak through tokenizer error messages.
Rather than wire up an opt-in flag that suggests this is a supported
mode, remove the dead flag and the dead guard, and document in the
named-checkzone and named-compilezone manual pages that these tools
must not be run on zone text from an untrusted source.
Assisted-by: Claude:claude-opus-4-7
When processing a referral, the `cache_delegns()` function was accepting
glues from a different parent. For instance:
```
AUTHORITY
test.example. NS ns.test.example.
test.example. NS ns.foo.example.
test.example. NS ns.bar.
ADDITIONAL
ns.bar. A 1.2.3.4
ns.foo.example. A 5.6.7.8
ns.test.example. A 9.8.7.6
```
In such situation, only the glues for `ns.foo.example.` and
`ns.test.example.` should be used, and the glue from `ns.bar.` should be
ignored as this is not either a sub-domain or a sibling domain, the
parent is different (`bar.` instead of `example.`). This is now fixed.
Sibling glue and cyclic sibling glues are defined in RFC 9471 section
2.2 and section 2.3.
OPENSSL_cleanup() in OpenSSL 4 doesn't free the memory, and that is
not compatible with BIND 9's memory leak detection code. Don't use
custom allocation/deallocation functions for OpenSSL's internal memory
management in the ossl3.c module.
See https://github.com/openssl/openssl/pull/29721
Compute qpzone_get_lock(elem->node) into a local variable while the
heap lock is still held, rather than dereferencing the stale elem
pointer after releasing the lock. A concurrent thread running
setsigningtime() (e.g. via IXFR apply on a worker thread) could free
the top-of-heap element between the heap lock release and the
dereference, causing a use-after-free.
In the dns_zonefetch mechanism, some option flags for
dns_resolver_createfetch() were used for all fetches, but
were actually only needed by the DNSKEY refresh fetches.
(Specifially, these options were DNS_FETCHOPT_UNSHARED
and DNS_FETCHOPT_NOCACHED, which were used along with
DNS_FETCHOPT_NOVALIDATE to ensure we get a new copy of
the DNSKEY as it is currently published by the authority,
without prior validation. Those conditions are needed
for RFC 5011 trust anchor maintenace, but not when looking
up parent-NS or DSYNC RRsets.)
Instrument the delegation cache (introduced to back both NS-based and
DELEG-based delegations) with 11 USDT probes in the libdns provider so
that hit rate, eviction pressure, and lookup latency can be measured
without recompiling or enabling logging.
The probes are:
- delegdb_lookup_start / delegdb_lookup_done wrap dns_delegdb_lookup()
and pass the query name plus the result code.
- delegdb_insert_start / delegdb_insert_done wrap dns_delegset_insert().
The early SHUTTINGDOWN return is funneled through the cleanup label
so the done probe fires on every path.
- delegdb_cleanup_start / delegdb_cleanup_done bracket the SIEVE-based
eviction triggered when the cache goes overmem, reporting the number
of bytes requested and actually reclaimed. An additional per-node
delegdb_evict probe (guarded by _ENABLED() because it fires inside
the loop) exposes which zones are being evicted.
- delegdb_create, delegdb_reuse, and delegdb_shutdown trace the per-view
lifecycle across server reloads.
- delegdb_delete traces rndc flush-delegation paths, reporting whether
a subtree or single name was removed.
Name arguments are stringified with dns_name_format() behind
LIBDNS_*_ENABLED() guards so that the hot lookup and insert paths remain
zero-cost when no consumer is attached.
SIG (24) and NXT (30) are obsolete DNSSEC record types, superseded by
RRSIG and NSEC in RFC 3755. Allowing them through dynamic update
exposes two distinct bugs that the surrounding GL#5818 work already
fixes as defense-in-depth:
- dns__db_findrdataset() used to REQUIRE that (covers == 0 ||
type == RRSIG), which aborts named when a SIG update reaches the
prescan foreach_rr() call. Fixed to accept dns_rdatatype_issig().
- diff.c rdata_covers() used to test only RRSIG, dropping the
covered-type field for SIG rdatas; the zone DB then filed every
SIG rdataset under typepair (SIG, 0) instead of
(SIG, covered_type) and follow-up adds collided at that bucket.
Fixed to use dns_rdatatype_issig().
Both underlying bugs are still reachable via inbound zone transfer
(diff.c rdata_covers() runs from both dns_diff_apply on the IXFR path
and dns_diff_load on the AXFR path), so the type-helper fixes above
remain necessary. For the dynamic-update path, the simplest and
safest posture is to refuse SIG and NXT outright at the front door in
ns/update.c, alongside the existing NSEC/NSEC3/non-apex-RRSIG
refusals. KEY remains permitted because it is still used to carry
public keys for SIG(0) transaction authentication.
The existing tcp-self SIG regression test is repointed to assert
REFUSED on the SIG add, a symmetric NXT test is added, and the
SIG-via-dyn-update covers-bucket test is removed because it is no
longer reachable through this entry point; AXFR-based coverage of
diff.c rdata_covers() follows in a separate commit.
rdata_covers() in lib/dns/diff.c discriminated only on
dns_rdatatype_rrsig (46) and returned 0 for the legacy SIG (24), so
the covered-type field was silently discarded on the dynamic-update
and IXFR paths. Every SIG rdataset was then filed in the zone DB
under typepair (SIG, 0) instead of (SIG, covered_type); a second SIG
add with a different covers but a different TTL collided at that
bucket, tripped DNS_DBADD_EXACTTTL in qpzone, returned
DNS_R_NOTEXACT, and came back to the client as SERVFAIL.
Use dns_rdatatype_issig() here so both SIG and RRSIG carry their
covers through the diff, matching the helper pattern already used in
lib/dns/master.c, lib/ns/xfrout.c, lib/dns/qpcache.c, and the
dns__db_findrdataset() REQUIRE that the surrounding merge request
just relaxed.
dns__db_findrdataset() had a REQUIRE() that only accepted
dns_rdatatype_rrsig when the covers parameter was set. A dynamic
update containing a SIG record (type 24) would trigger this
assertion, crashing named. Use dns_rdatatype_issig() to accept
both SIG and RRSIG.
resign_sooner_values() only checked whether rhs was SOA-typed when
resign times were equal, but did not check lhs. When both entries were
SOA-typed with equal resign times, the comparison returned true in both
directions, violating irreflexivity and corrupting heap invariants.
Add lhs_typepair parameter and require lhs to be non-SOA for the
tie-breaking logic to apply.
With the parent-centric resolver, dns_view_bestzonecut() consults the
delegation DB (view->deleg) rather than the main cache for the closest
zonecut. Root is never the target of a referral, so it never lands in
delegdb; bestzonecut therefore falls through to the hints lookup on
every query whose closest ancestor is root. prime_done() only called
dns_root_checkhints(), which logs discrepancies but does not update
any store bestzonecut looks at, so the fresh root NS records obtained
by priming were never used and priming kept re-firing.
Rename view->hints to view->rootdb and refresh it when a priming
fetch completes: the '.' NS rdataset is replaced with the fetched
one, and for each listed nameserver the matching A/AAAA glue is
copied from the response's ADDITIONAL section. Only glue for names
that actually appear as NS targets is accepted, so a hostile response
cannot inject unrelated records. Glue the response did not carry is
left untouched, so the hints-file records loaded at startup remain as
a fallback.
Each view gets its own rootdb: the previous shared
named_g_server->in_roothints is gone, and configure_view() calls
dns_rootns_create() per view when the class-IN defaults are needed.
That keeps the priming writer one-per-DB, so concurrent priming in
different views cannot race on the same zone-DB version.
The rootdb refresh runs synchronously from the resolver response path,
so records go straight from the wire into rootdb with no cache round
trip and no dependency on DNSSEC validation state. A new
DNS_FETCHOPT_PRIMING option marks the priming fetch; prime_done()
itself is now pure cleanup.
Track the rootdb freshness window in view->rootdb_expires and trigger
re-priming lazily from dns_view_find() and bestzonecut_rootdb() only
when the window has elapsed. Stale records are still served while the
fresh priming fetch is in flight.
Drop dns_root_checkhints() and its helpers; the rootdb is now the
authoritative source the resolver consults.
The filename of the catalog member zones are generated dynamically
based on the zone's name. If the zone's name is too long or if it
contains special characters the name's digest is used instead.
Since '%' and '$' are now treated as special characters in the zone
names (see !10779), add these characters to the list of the special
characters.
The setfilename() function uses case-insensitive strcasestr() when
matching the possible tokens, but then one of the token parsers
uses case-sensitive INSIST checks which can assert when, for example,
matching '%X' and INSIST only accepts '%x'.
The case-insensitivity is documented, which means it's the parser
that needs to be fixed, not the matcher.
Convert the character to lowercase before checking the token's
validity.
The `DNS_DBFIND_NOEXACT` flag name is ambiguous, as it does not clearly
indicate the lookup behavior (e.g., sibling, child, or parent).
Rename it to `DNS_DBFIND_ABOVE` to better reflect that the lookup
targets a closer ancestor name.
Expired delegation nodes are naturally replaced when the resolver
fetches fresh data, and any remaining stale nodes are reclaimed by
SIEVE eviction under memory pressure.
delegdb_cleanup() was overwriting the caller-supplied 'requested'
value with (hiwater - lowater), so every overmem cleanup tried to
free the full watermark band regardless of how much memory the new
delegation actually needed. Drop the override so the caller's size
is used: we now walk the SIEVE only until we have reclaimed enough
room for the new node, leaving unrelated entries in place.
dns_delegset_fromnsrdataset() used isc_g_mctx for the transient
delegset it builds from a DNS NS rdataset. That hides delegation
data in the global default context instead of accounting it against
the subsystem that owns it: a resolver fctx, a view, or a query
context.
Take an explicit mctx parameter so callers can direct the allocation
to the right place, and update the three call sites:
- lib/dns/view.c:1189 (dns_view_bestzonecut fallback) uses view->mctx
- lib/dns/resolver.c:7071 (resume_dslookup) uses fctx->mctx
- lib/ns/query.c:8672 (query_delegation_recurse) uses the client
manager's mctx
Also tighten delegdb cleanup to run inside the same write transaction
as the insert: delegdb_node_prepare() now returns the size of the new
node, and delegdb_cleanup() takes the caller's open qp so that the
overmem reclamation and the insert share one commit instead of doing
two nested write transactions.
dns__deleg_lookup() with DNS_DBFIND_NOEXACT is supposed to return
the deepest proper ancestor of the lookup name. It called
getparentnode() to step up from an exact match, but getparentnode()
only iterated while the chain length was >= 2. When the chain
contained a single entry (the exact match itself with no ancestor
stored in the trie), the loop did not execute and left the caller
looking at the exact match. The subsequent isactive() check then
returned success and the function reported the exact match as the
"deepest ancestor", violating NOEXACT semantics.
This was observable as the resolver picking the child-side
delegation for an at-parent type (e.g. a DS query for a TLD), then
sending the query to the child's own nameservers and recovering via
the "chase DS servers" path.
Have getparentnode() set '*node' to NULL when it cannot find an
active proper ancestor, and make dns__deleg_lookup() NULL-check
before returning, matching the canonical NOEXACT implementation in
dns_zt_find(). Update the deleg unit test to expect NOTFOUND for
the top-level-no-parent case.
When the validator needs a DS RRset and the cache does not have it,
get_dsset() falls back to creating a fresh fetch. Without a hint, the
resolver picks the closest known zone cut for the DS query, and in the
parent-centric resolver that can land on a delegation at the DS owner
name itself (the child side). This can happens when the parent
delegation is expired, or if the zonecut of the parent doesn't match the
labels in the name.
Querying the child for its own DS records yields NODATA from the apex of
the zone, which sends the resolver into the "chase DS servers" recovery
path and costs two extra round trips for a parent delegation we already
had cached in the delegation database.
Look up the parent zone in the delegation database before kicking
off the fetch, and pass any usable delegation to the resolver as a
hint. When the hint is present, the resolver sends the DS query
straight to the parent's nameservers and the chase path is avoided
entirely.
To support this, create_fetch() now takes optional 'domain' and
'delegset' parameters that are forwarded to dns_resolver_createfetch().
All other call sites pass NULL.
When a zone filename is defined in named.conf which will be
written to by the server - i.e., secondary or dynamically updated
zones - there is a test at configuration time to ensure that the
filename is non-unique.
This test is run before the zone is actually created, so a zone
configured using a template may not have had its filename expanded
yet. This can cause a configuration to fail because, for example,
multiple zones appear to using the filename "$name.db".
This has been fixed by calling dns_zone_expandzonefile() from
isccfg_check_zoneconf(), to expand the names when checking for
uniqueness.
This adds a new API call dns_zone_expandzonefie(), which will enable
named-checkconf to expand filenames the same way the server does in
dns_zone_setfile().
When processing a catalog zone member's primaries definition and
there is a TXT record containing an invalid name TSIG key name,
dns_name_free was incorrectly called triggering an assertion.
This has been fixed.
Move disptype and transport into dispatch_hash() and dispatch_match()
so that the match function is the single source of truth for whether
two TCP dispatches are interchangeable. This replaces the post-loop
disptype filter in dispatch_gettcp() and makes the disptype field in
struct dispatch_key actually used.
TCP dispentries no longer use the global QID hash table at all.
Responses are matched by scanning disp->active, and sequential
per-dispatch IDs (bounded by the pipelining limit) are unique
within a single dispatch by construction. Since TCP delivers
only data we asked for on a specific connection, the per-peer
uniqueness that the global table enforced was never actually
needed for TCP.
DNS_DISPATCHOPT_FIXEDID is plumbed through dns_request_createraw
-> get_dispatch -> dns_dispatch_createtcp so FIXEDID TCP requests
always get a fresh isolated dispatch — the caller-supplied ID
then cannot collide with any other in-flight query either.
Cap the number of in-flight queries on a single shared TCP dispatch.
When the limit is reached, the dispatch is removed from the hash
table so subsequent queries get a fresh connection. The existing
dispatch continues serving its queries until they complete.
This bounds the blast radius of a connection drop: at most N queries
fail simultaneously instead of all queries to that server.
The default limit is 256. It can be overridden for testing via
'named -T tcppipelining=N'.