The format package is used by ktesting, both to reconfigure Gomega and to
format errors, therefore it has to be moved to staging together with ktesting,
if or when we get to that because those are desirable features.
Because format only has the YAML package as additional dependency and that
should be okay for all other repos (except for the YAML package itself, of
course), we can publish the format package as a sub-package of such a future
ktesting module.
Avoiding the dependency on apimachinery to detect unstructured.Unstructured is
a bit tricky, but doable by relaxing what we check for. The test/utils/format
package is kept to test ktesting/format with the actual packages that it cannot
depend on (apimachinery, api).
Depending on component-base/logs/testinit was convenient and avoided any doubts
about the init order, but isn't acceptable long-term as an additional
dependency because component-base is too big. The same functionality (flag
registration) can also be implemented directly in ktesting. Because Go 1.21
clarified the order in which independent packages get initialized, we know for
sure that "our" code runs after testinit and can handle a potential conflict.
While at it, introduce a KTESTING_VERBOSITY env variable to enable increasing
the default verbosity in CI jobs which run a mixture of tests where some don't
use ktesting and thus don't accept a -v=<something> parameter.
These hints showed for the client-go/ktesting because there the code is new.
They also apply exactly the same way to the old code, so both gets updated.
The client-go variant of ktesting is a superset of the normal
ktesting, which makes it possible to get the full original
functionality simply by changing the import path.
The goal is to make ktesting available for unit testing in *all* Kubernetes
packages. To achieve that, it must not depend on packages which themselves
depend on other Kubernetes packages. client-go was the biggest of those
dependencies (but not the only one, see below), so it can't be part of the
TContext API.
How to to bring back passing of those values via a TContext is to be
decided. Options are:
- via WithValue
- by wrapping TContext
k8s.io/component-base/logs is another problematic dependency that is going to
be harder to resolve. Others are just work (testify!).
To prevent regressing accidentally, import-boss is now used to check
dependencies.
As a special case, WithContext preserved the logger in the parent context. But
for the upcoming usage of WithValue to store a Kubernetes client it is
important to also preserve access to other values.
Deadline is available inside a synctest bubble, but calling it panics. To
support constructing a TContext inside a bubble, we have to catch the panic
because there is no API to detect a bubble in advance. Detecting a panic is
then also used to set the result of TContext.IsSyncTest.
While at it, cleaning up the code a bit and adding unit tests for the Deadline
behavior.
Not canceling the parent context made sense, but the new context should
be cancelable like any other TContext. Found when passing tCtx.WithoutCancel()
to StartTestServer and the tear-down function got stuck because it couldn't
cancel the context.
The type alias made `go doc ./test/utils/ktesting.TContext` useless and was a
weird workaround for preserving the original interface type name. Passing a
TContext instance by value (almost) preserves the original API and is
acceptable because the struct is still small. The only consumers which need to
be updated are those which relied on passing nil as tCtx.
If we ever find that TContext is or becomes too large, then we can make it
a wrapper around some pointer.
I've not been able to trigger the flake, but it could happen:
- time.Sleep unblocks some background goroutines inside the synctest bubble.
- Those goroutines do not actually run yet.
- The main test checks for the result of those goroutines.
Adding a `synctest.Wait` ensures that all background processing is complete
because it waits for all goroutines to be durably blocked.
If the goroutine happens to log after the test has already terminated,
testing.T.Log panics. We must ensure that the goroutine has stopped before
allowing the test to terminate.
When aborting an integration test with CTRL-C while it runs,
the current test fails and etcd exits. But additional tests were still being
started and the failed slowly because they couldn't connect to etcd.
It's better to fail additional tests in ktesting.Init when the test run has
already been interrupted.
While at it, also make it a bit more obvious that testing was interrupted by
logging it and update one comment about this and clean up the naming of
contexts in the code.
Example:
$ go test -v ./test/integration/quota
...
I1106 11:42:48.857162 147325 etcd.go:416] "Not using watch cache" resource="events.events.k8s.io"
I1106 11:42:48.857204 147325 handler.go:286] Adding GroupVersion events.k8s.io v1 to ResourceManager
W1106 11:42:48.857209 147325 genericapiserver.go:765] Skipping API events.k8s.io/v1beta1 because it has no resources.
^C
INFO: canceling test context: received interrupt signal
{"level":"warn","ts":"2024-11-06T11:42:48.984676+0100","caller":"embed/serve.go:160","msg":"stopping insecure grpc server due to error","error":"accept tcp 127.0.0.1:44177: use of closed network connection"}
...
I1106 11:42:50.042430 147325 handler.go:142] kube-apiserver: GET "/apis/rbac.authorization.k8s.io/v1/clusterroles" satisfied by gorestful with webservice /apis/rbac.authorization.k8s.io/v1
test_server.go:241: timed out waiting for the condition
--- FAIL: TestQuota (11.45s)
=== RUN TestQuotaLimitedResourceDenial
quota_test.go:292: testing has been interrupted: received interrupt signal
--- FAIL: TestQuotaLimitedResourceDenial (0.00s)
=== RUN TestQuotaLimitService
quota_test.go:418: testing has been interrupted: received interrupt signal
--- FAIL: TestQuotaLimitService (0.00s)
FAIL
When cleaning up the progress channel properly (stop signal delivery, closing
the channel), the loop dumping progress reports no longer needs to check for
the separate shutdown context. Instead, it can distinguish between "signal
received" and "channel closed".
The signal context was getting cleanup by canceling it, but a channel is better
because it avoids the slightly misleading "received interrupt signal"
cancellation when the test was only shutting down.
The "received interrupt signal" is useful also when running with "go test"
without -v because it shows that the shutdown has started.
But more important is that a progress report gets shown because that feature is
useful in particular when "go test" produces no output while it runs.
The recent change to support importing ktesting into an E2E suite
without progress reporting was flawed:
- If a Go unit test had a deadline (the default when invoked
by `go test`!), the early return skipped initializing progress
reporting.
- When it didn't, for example when invoking a test binary directly
under stress, a test created goroutines which were kept running,
which broke leak checking in e.g. an integration tests TestMain.
The revised approach uses reference counting: as long as some unit test is
running, the progress reporting with the required goroutines are active.
When the last one ends, they get cleaned up, which keeps the goleak
checker happy.
The TestCause tests were already unreliable in the CI. The others failed under
stress.
As synctest we have to be more careful how to construct and clean up the parent
context for TestCause (must happen inside bubble), but once that's handled we
can reliably measure the (fake) time and compare exactly against expected
results.
Manually pairing Being with End is too error prone to be useful. It had the
advantage of keeping variables created between them visible to the following
code, but that doesn't justify using those calls.
By using a callback we can achieve a few things:
- Code using it automatically shadows the parent tCtx, thus enforcing
that within a code block the tCtx with step is used consistently.
- The code block is clearly delineated with curly braces.
- When the code block ends, the unmodified parent tCtx is automatically
in scope again.
Downsides:
- Extra boilerplate for the anonymous function.
Python's `with tCtx.Step(...) as tCtx: ` would be nicer.
As an approximation of that `for tCtx := range tCtx.Step(...)` was
tried with `Step` returning an iterator, but that wasn't very idiomatic.
- Variables created inside the code block are not visible outside of it.
(cherry picked from commit 047682908d)
Refactoring the DRA upgrade/downgrade testing such that it runs as Go test
depended on supporting ktesting in the E2E framework. That change worked during
presubmit testing, but broke some periodic jobs. Therefore the relevant commits
from https://github.com/kubernetes/kubernetes/pull/135664/commits get reverted:
c47ad64820 DRA e2e+integration: test ResourceSlice controller
047682908d ktesting: replace Begin/End with TContext.Step
de47714879 DRA upgrade/downgrade: rewrite as Go unit test
7c7b1e1018 DRA e2e: make driver deployment possible in Go unit tests
65ef31973c DRA upgrade/downgrade: split out individual test steps
47b613eded e2e framework: support creating TContext
The last one is what must have caused the problem, but the other commits depend
on it.
Bumping to 5 is useful in unit tests. Those tend to not produce less output and
ideally use per-test output, so we end up keeping only the output of failed
tests where increased verbosity also in CI runs is useful.
But ktesting now also gets imported into e2e test binaries through the
framework. There the increased verbosity is apparently causing OOM killing in
some jobs which previously worked fine.
Long term we need a better solution than simply disabling the verbosity
change. We could modify each unit test to call SetDefaultVerbosity, but that's
tedious. Perhaps an env variable? It cannot be a command line flag because not
all unit tests accept `-v`.
Gomega formats errors by first showing Error() (already has all information
after WithError) and then again by dumping the error struct, which is redundant
in this case. We can avoid the latter by providing a GomegaString
implementation which returns nothing.
Being able to call arbitrary functions is useful, even if it means giving up
some compile-time checking. Because we now use reflection,
Eventually/Consistently can be methods.
Manually pairing Being with End is too error prone to be useful. It had the
advantage of keeping variables created between them visible to the following
code, but that doesn't justify using those calls.
By using a callback we can achieve a few things:
- Code using it automatically shadows the parent tCtx, thus enforcing
that within a code block the tCtx with step is used consistently.
- The code block is clearly delineated with curly braces.
- When the code block ends, the unmodified parent tCtx is automatically
in scope again.
Downsides:
- Extra boilerplate for the anonymous function.
Python's `with tCtx.Step(...) as tCtx: ` would be nicer.
As an approximation of that `for tCtx := range tCtx.Step(...)` was
tried with `Step` returning an iterator, but that wasn't very idiomatic.
- Variables created inside the code block are not visible outside of it.
Many helper packages need to know the test namespace in addition to the client.
Supporting passing of that information through the TContext avoids adding
another parameter to such helper packages.
This mirrors similar functionality in framework.Framework which also provides
a namespace to the test and packages taking such a parameter.
Instead of granting callers access to the instance stored in the context,
let's return a copy. Otherwise a caller might accidentally modify what is
stored in the context when they forget to make a copy themselves before
modifying fields.
The original implementation was inspired by how context.Context is handled via
wrapping a parent context. That approach had several issues:
- It is useful to let users call methods (e.g. tCtx.ExpectNoError)
instead of ktesting functions with a tCtx parameters, but that only
worked if all implementations of the interface implemented that
set of methods. This made extending those methods cumbersome (see
the commit which added Require+Assert) and could potentially break
implementations of the interface elsewhere, defeating part of the
motivation for having the interface in the first place.
- It was hard to see how the different TContext wrappers cooperated
with each other.
- Layering injection of "ERROR" and "FATAL ERROR" on top of prefixing
with the klog header caused post-processing of a failed unit test to
remove that line because it looked like log output. Other log output
lines where kept because they were not indented.
- In Go <=1.25, the `go vet sprintf` check only works for functions and
methods if they get called directly and themselves directly pass their
parameters on to fmt.Sprint. The check does not work when calling
methods through an interface. Support for that is coming in Go 1.26,
but will depend on bumping the Go version also in go.mod and thus
may not be immediately possible in Kubernetes.
- Interface documentation in
https://pkg.go.dev/k8s.io/kubernetes@v1.34.2/test/utils/ktesting#TContext
is a monolithic text block. Documentation for methods is more readable and allows
referencing those methods with [] (e.g. [TC.Errorf] works, [TContext.Errorf]
didn't).
The revised implementation is a single struct with (almost) no exported
fields. The two exceptions (embedded context.Context and TB) are useful because
it avoids having to write wrappers for several functions resp. necessary
because Helper cannot be wrapped. Like a logr.LogSink, With* methods can make a
shallow copy and then change some fields in the cloned instance.
The former `ktesting.TContext` interface is now a type alias for
`*ktesting.TC`. This ensures that existing code using ktesting doesn't need to
be updated and because that code is a bit more compact (`tCtx
ktesting.TContext` instead of `tCtx *ktesting.TContext` when not using such an
alias). Hiding that it is a pointer might discourage accessing the exported
fields because it looks like an interface.
Output gets fixed and improved such that:
- "FATAL ERROR" and "ERROR" are at the start of the line, followed by the klog header.
- The failure message follows in the next line.
- Continuation lines are always indented.
The set of methods exposed via TB is now a bit more complete (Attr, Chdir).
All former stand-alone With* functions are now also available as methods and
should be used instead of the functions. Those will be removed.
Linting of log calls now works and found some issues.
Assert is useful because sometimes one wants to check several different things
in the same test, without stopping after the first failed assertion.
Require gets added for symmetry and to mirror testify's require and
assert. Require is identical to Expect (gomega naming).
We need to see the actual effect that ktesting will have when used as wrapper
around testing.T. Producing an error hid that adding the klog header makes some
output sub-optimal:
<klog header>: FATAL ERROR: ...
The problem with having the klog header at the beginning of the line is that
the Kubernetes `make test` post-processing treats this as log output instead
of the failure message of the test. Will be fixed separately.
This has been replaced by `//build:...` for a long time now.
Removal of the old build tag was automated with:
for i in $(git grep -l '^// +build' | grep -v -e '^vendor/'); do if ! grep -q '^// Code generated' "$i"; then sed -i -e '/^\/\/ +build/d' "$i"; fi; done
A "sync test" runs the test inside a testing/synctest bubble. Time moves
forward in a deterministic fashion when all goroutines are blocked
waiting for time to progress. This simplifies testing concurrent behavior.
ktesting enables writing such tests with a new SyncTest method: it can start a
new sub-test (similar to Run) or turn an existing test (typically a top-level
Test<something>) into a sync test when no new name is given.
TContext.IsSyncTest can be used to check the mode of the current test, which
may be useful in common helper code.
TContext.Wait directly maps to synctest.Wait.
This new functionality is limited to tests which use an underlying testing.T
instance.
This closes a gap compared to the context package. It's useful when combined
with Ginkgo to keep something running beyond the end of the Ginkgo BeforeEach
or It node.
That WithCancel added a deferred cleanup which cancels on test termination was
unexpected. This automatic cancellation makes sense only for the initial root
TContext.
This allows declaring a code region as one step without having to use
an anonymous callback function, which has the advantage that variables
set during the step are visible afterwards.
In Python, this would be done as
with ktesting.Step(tctx) as tcxt:
// some code code inside step
// code not in the same step
But Go has no such construct.
In contrast to WithStep, the start and end of the step are logged, including
timing information.
Hiding the error in WithError is the right choice for example
when it is used inside ktesting.Eventually. Most callers probably want to deal
with the unexpected error themselves. For those who don't, WithErrorLogging
continues to log it.
WithTB was originally defined as "uses the existing logger". But what we want
there and in the newer TContext.Run is the usual per-test logging, now for the
sub-test.
This is not necessarily a problem, some code might use a timeout and expect it
to trigger. Therefore this should only be an info message, not a
warning. Long-term it might be useful to have an API where the caller decides
whether this gets logged.
The caller should use short messages and leave it to the user of those to
provide more context (no pun intended...). When logging, "canceling context" is
that context.
Before:
scheduler_perf.go:1431: FATAL ERROR: op 7: delete scheduled pods: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
contexthelper.go:69:
WARNING: the operation ran for the configured 2s
After:
scheduler_perf.go:1431: FATAL ERROR: op 7: delete scheduled pods: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
contexthelper.go:69:
INFO: canceling context: the operation ran for the configured 2s
How exactly a test reacts when its context times out is unclear. In the case of
scheduler_perf, the apiserver started to shut down and the test failure then
was about not being able to reach the apiserver, which was a bit confusing.
To make it more obvious why the shutdown starts, a WARNING message gets added
to the test output by ktesting before cancellation and thus before any other
output related to that cancellation.