The coccinellery repository provides many little semantic patches to fix common
problems in the code. The number of semantic patches in the coccinellery
repository is high and most of the semantic patches apply only for Linux, so it
doesn't make sense to run them on regular basis as the processing takes a lot of
time.
The list of issue found in BIND 9, by no means complete, includes:
- double assignment to a variable
- `continue` at the end of the loop
- double checks for `NULL`
- useless checks for `NULL` (cannot be `NULL`, because of earlier return)
- using `0` instead of `NULL`
- useless extra condition (`if (foo) return; if (!foo) { ...; }`)
- removing & in front of static functions passed as arguments
The dns_name_copy() function followed two different semanitcs that was driven
whether the last argument was or wasn't NULL. This commit splits the function
in two where now third argument to dns_name_copy() can't be NULL and
dns_name_copynf() doesn't have third argument.
This commit was done by hand to add the RUNTIME_CHECK() around stray
dns_name_copy() calls with NULL as third argument. This covers the edge cases
that doesn't make sense to write a semantic patch since the usage pattern was
unique or almost unique.
This second commit uses second semantic patch to replace the calls to
dns_name_copy() with NULL as third argument where the result was stored in a
isc_result_t variable. As the dns_name_copy(..., NULL) cannot fail gracefully
when the third argument is NULL, it was just a bunch of dead code.
Couple of manual tweaks (removing dead labels and unused variables) were
manually applied on top of the semantic patch.
This commit add RUNTIME_CHECK() around all simple dns_name_copy() calls where
the third argument is NULL using the semantic patch from the previous commit.
The dns_name_copy() function cannot fail gracefully when the last argument
(target) is NULL. Add RUNTIME_CHECK()s around such calls.
The first semantic patch adds RUNTIME_CHECK() around any call that ignores the
return value and is very safe to apply.
The second semantic patch attempts to properly add RUNTIME_CHECK() to places
where the return value from `dns_name_copy()` is recorded into `result`
variable. The result of this semantic patch needs to be reviewed by hand.
Both patches misses couple places where the code surrounding the
`dns_name_copy(..., NULL)` usage is more complicated and is better suited to be
fixed by a human being that understands the surrounding code.
The libidn2 library on Ubuntu Bionic is broken and idn2_to_unicode_8zlz() does't
fail when it should. This commit ensures that we don't run the system test for
valid A-label in locale that cannot display with the buggy libidn2 as it would
break the tests.
It is possible dig used ACE encoded name in locale, which does not
support converting it to unicode. Instead of fatal error, fallback to
ACE name on output.
Bring the files describing Windows-specific aspects of building and
installing BIND up to date. Remove the parts which are either outdated
(e.g. 32-bit build instructions), already included elsewhere (e.g. the
list of Windows systems BIND is known to run on), or inconvenient to
keep up to date in the long run (e.g. ARM chapter numbers).
Ensure BIND can be tested on Windows in GitLab to more quickly catch
build and test errors on that operating system.
Some notes:
- While build jobs are triggered for all pipelines, system test jobs
are not - due to the time it takes to run the complete system test
suite on Windows (about 20 minutes), the latter are only run for
pipelines created through GitLab's web interface and for pipelines
created for Git tags.
- Only the "Release" build configuration is currently used. Adding
"Debug" builds is a matter of extending .gitlab-ci.yml, but it was
not done for the time being due to questionable usefulness of
performing such builds in GitLab CI.
- Only a 64-bit build is performed. Adding support for 32-bit builds
is not planned to be implemented.
- Unit tests are still not run on Windows, but adding support for that
is on the roadmap.
- All Windows GitLab CI jobs are run inside Windows Server containers,
using the Custom executor feature of GitLab Runner as Windows Server
2016 is not supported by GitLab Runner's native Docker on Windows
executor and Windows Server 2019 is not yet widely available from
hosting providers.
- The Windows Docker image used by GitLab CI is not stored in the
GitLab Container Registry as it is over 27 GB in size and thus
passing it between GitLab and its runners is impractical.
- There is no vcvarsall.bat variant written in PowerShell and batch
scripts are no longer supported by GitLab Runner Custom executor, so
the environment variables set by vcvarsall.bat are injected back
into the PowerShell environment by processing the output of "set".
- Visual Studio parallel builds are a bit different than "make -jX"
builds as parallelization happens in two tiers: project parallelism
(controlled by the "/maxCpuCount" msbuild.exe switch) and compiler
parallelism (controlled by the "/MP" cl.exe switch). To limit the
total number of compiler processes spawned concurrently to a value
similar to the one used for Unix builds, msbuild.exe is allowed to
build at most 2 projects at once, each of which can spawn up to half
of BUILD_PARALLEL_JOBS worth of compiler processes. Using such
parameters is a fairly arbitrary decision taken to solve the
trade-off between compilation speed and runner load.
- Configuring network addresses in Windows Server containers is
tricky. Adding 10.53.0.1/24 and similar addresses to the vEthernet
interface created by Docker never causes ifconfig.bat to fail, but
in fact only one container can have any given IP address configured
at any given time (the request to add the same address in another
container is silently ignored). Thus, in order to allow multiple
system test jobs to be run in parallel, the addresses used in system
tests are configured on the loopback interfaces. Interestingly
enough, the addresses set on the loopback interfaces... persist
between containers. Fortunately, this is acceptable for the time
being and only requires ifconfig.bat failures to be ignored (as
ifconfig.bat will fail if it attempts to configure an already
existing address on an interface). We also need to wait for a brief
moment after calling ifconfig.bat as the addresses the latter
attempts to configure may not be immediately available after it
returns (and that causes runall.sh to error out). Finally, for some
reason we also need to signal that the DNS servers on each loopback
interface are to be configured using DHCP or else ifconfig.bat will
fail to add the requested addresses.
- Since named.pid files created by named instances used in system
tests contain Windows PIDs instead of Cygwin PIDs and various
versions of Cygwin "kill" react differently when passed Windows PIDs
without the -W switch, all "kill" invocations in GitLab CI need to
use that switch (otherwise they would print error messages which
would cause stop.pl to assume the process being killed died
prematurely). However, to preserve compatibility with older Cygwin
versions used in our other Windows test environments, we alter the
relevant scripts "on the fly" rather than in the Git repository.
- In the containers used for running system tests, Windows Error
Reporting is configured to automatically create crash dumps in
C:\CrashDumps. This directory is examined after the test suite is
run to ensure no crashes went under stop.pl's radar.
The SYSTEMTESTTOP variable is set by bin/tests/system/run.sh. When
system tests are run on Windows, that variable will contain an absolute
Cygwin path. In the case of the "statschannel" system test, using the
unmodified SYSTEMTESTTOP variable in tests.sh causes the RNDCCMD
variable to contain an invocation of a native Windows application with
an absolute Cygwin path passed as a parameter, which prevents rndc from
working in that system test. Until we have a cleaner solution, override
SYSTEMTESTTOP with a relative path to work around the issue and thus fix
the "statschannel" system test on Windows.
Make sure the CYGWIN environment variable is set whenever system tests
are run on Windows to prevent stop.pl from making incorrect assumptions
about the environment it is running in, which triggers e.g. false
reports about named instances crashing on shutdown when system tests are
run on Windows. This issue has not been caught earlier because the
CYGWIN environment variable was incidentally being set on a higher level
in our Windows test environments.
Error reporting for parallel system tests on Windows has been broken all
along: since all parallel.mk targets generated by parallel.sh pipe their
output through "tee", the return code from run.sh is lost and thus
running "make -f parallel.mk check" will not yield a non-zero return
code if some system tests fail. The same applies to runsequential.sh.
Yet, runall.sh on Windows only sets its return code to a non-zero value
if either "make -f parallel.mk check" or runsequential.sh returns a
non-zero return code. Fix by making runall.sh yield a non-zero return
code when testsummary.sh fails, which is the same approach as the one
used in the "test" target in bin/tests/system/Makefile.
Until now, the build process for BIND on Windows involved upgrading the
solution file to the version of Visual Studio used on the build host.
Unfortunately, the executable used for that (devenv.exe) is not part of
Visual Studio Build Tools and thus there is no clean way to make that
executable part of a Windows Server container.
Luckily, the solution upgrade process boils down to just adding XML tags
to Visual Studio project files and modifying certain XML attributes - in
files which we pregenerate anyway using win32utils/Configure. Thus,
extend win32utils/Configure with three new command line parameters that
enable it to mimic what "devenv.exe bind9.sln /upgrade" does. This
makes the devenv.exe build step redundant and thus facilitates building
BIND in Windows Server containers.
Build configuration for the dnssec-cds Visual Studio project is absent
from the solution file template, which means the solution needs to be
upgraded using "devenv bind9.sln /upgrade" in order for the dnssec-cds
project to be built. Add the build configuration for dnssec-cds to the
solution file template so that upgrading the solution is not necessary
for building that project.
named-checkzone does not use libbind9. Update the Visual Studio project
file template for named-checkzone to reflect that, thus preventing
compilation issues during parallel builds.
When commit 8eb88aafee removed liblwres,
it also modified nsupdate to use libirs instead of liblwres, but the
Visual Studio project files were not updated to reflect that change.
Make sure the nsupdate Visual Studio project depends on the libirs
project to prevent compilation issues during parallel builds.
Make stderr fully buffered on Windows to improve named performance when
it is logging to stderr, which happens e.g. in system tests. Note that:
- line buffering (_IOLBF) is unavailable on Windows,
- fflush() is called anyway after each log message gets written to the
default stderr logging channels created by libisc.
BIND system tests are run in a Cygwin environment. Apparently Cygwin
shell sets the SEM_NOGPFAULTERRORBOX bit in its process error mode which
is then inherited by all spawned child processes. This bit prevents the
Windows Error Reporting dialog from being displayed, which I assume is
part of an effort to contain memory handling errors triggered by Cygwin
binaries in the Cygwin environment. Unfortunately, this also prevents
automatic crash dump creation by Windows Error Reporting and Cygwin
itself does not handle memory errors in native Windows processes spawned
from a Cygwin shell.
Fix by clearing the SEM_NOGPFAULTERRORBOX bit inside named if it is
started in a Cygwin environment, thus overriding the Cygwin-set process
error mode in order to enable Windows Error Reporting to handle all
named crashes.