Skip the memory pressure PSI test when running with CRI-O until automatic
memory.high configuration is available in the runtime. The test fails on
Fedora CoreOS due to different page cache reclaim behavior, and CRI-O is
implementing a fix to automatically set memory.high to 95% of memory.max
for cgroup v2 containers.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The "create 100 slices" E2E sometimes flaked with timeouts (e.g. 95 out of 100
slices created). It created too much load for an E2E test.
The same test now uses ktesting as API, which makes it possible to run it as
integration test with the original 100 slices and with more moderate 10 slices
as E2E test.
(cherry picked from commit c47ad64820)
Manually pairing Being with End is too error prone to be useful. It had the
advantage of keeping variables created between them visible to the following
code, but that doesn't justify using those calls.
By using a callback we can achieve a few things:
- Code using it automatically shadows the parent tCtx, thus enforcing
that within a code block the tCtx with step is used consistently.
- The code block is clearly delineated with curly braces.
- When the code block ends, the unmodified parent tCtx is automatically
in scope again.
Downsides:
- Extra boilerplate for the anonymous function.
Python's `with tCtx.Step(...) as tCtx: ` would be nicer.
As an approximation of that `for tCtx := range tCtx.Step(...)` was
tried with `Step` returning an iterator, but that wasn't very idiomatic.
- Variables created inside the code block are not visible outside of it.
(cherry picked from commit 047682908d)
tCtx.Run and sub-tests make it much simpler to separate the different steps
than with Ginkgo because unless a test runs tCtx.Parallel (which we don't do
here), everything runs sequentially in a deterministic order.
Right now we get:
...
localupcluster.go:285: I1210 12:24:22.067524] bring up v1.34: stopping kubelet
localupcluster.go:285: I1210 12:24:22.067548] bring up v1.34: stopping kube-scheduler
localupcluster.go:285: I1210 12:24:22.067570] bring up v1.34: stopping kube-controller-manager
localupcluster.go:285: I1210 12:24:22.067589] bring up v1.34: stopping kube-apiserver
--- PASS: TestUpgradeDowngrade (94.78s)
--- PASS: TestUpgradeDowngrade/after-cluster-creation (2.07s)
--- PASS: TestUpgradeDowngrade/after-cluster-creation/core_DRA (2.05s)
--- PASS: TestUpgradeDowngrade/after-cluster-creation/ResourceClaim_device_status (0.02s)
--- PASS: TestUpgradeDowngrade/after-cluster-upgrade (4.10s)
--- PASS: TestUpgradeDowngrade/after-cluster-upgrade/core_DRA (4.09s)
--- PASS: TestUpgradeDowngrade/after-cluster-upgrade/ResourceClaim_device_status (0.01s)
--- PASS: TestUpgradeDowngrade/after-cluster-downgrade (1.24s)
--- PASS: TestUpgradeDowngrade/after-cluster-downgrade/core_DRA (1.21s)
--- PASS: TestUpgradeDowngrade/after-cluster-downgrade/ResourceClaim_device_status (0.02s)
PASS
It's even possible to use `-failfast` and
e.g. `-run=TestUpgradeDowngrade/after-cluster-creation/core_DRA`: `go test` then
runs everything up to that sub-test or any failing sub-test, then stops and
cleans up.
(cherry picked from commit de47714879)
The traditional behavior of PodIO was to ignore the context. Changing that to
use the canceled context was risky because maybe some cleanup operation after
cancellation of the context wouldn't run anymore when it previously did.
However, this is theoretical. Tests all seemed to pass fine even without this
change.
This leverages ktesting as wrapper around Ginkgo and testing.T to make all
helper code that is needed to deploy a DRA driver available to Go unit
tests and thus integration tests.
How to proceed with unifying helper code for integration and E2E testing is
open. This is just a minimal first step in that direction. Ideally, such
code should be in separate packages where usage of Ginkgo, e2e/framework
and gomega.Expect/Eventually/Consistently are forbidden.
While at it, the builder gets extended to make cleanup optional.
This will be needed for upgrade/downgrade testing with sub-tests.
(cherry picked from commit 7c7b1e1018)
It turned out that ginkgo.GinkgoT() wasn't as cheap as it should have been (fix
coming in Ginkgo 2.27.5). When instantiated once for each framework.Framework
instance during init by all workers at the same time, the resulting spike in
overall memory usage within the container caused OOM killing of workers in Prow
jobs like ci-kubernetes-e2e-gci-gce with very tight memory limits.
Even with the upcoming fix in Ginkgo it makes sense to set the TB field only
while it really is needed, i.e. while a test runs. This is conceptually similar
to setting and unsetting the test namespace. It may help to flush out incorrect
usage of TB outside of tests.
This makes it possible to call helper packages which expect a TContext from E2E
tests.
The implementation uses GinkgoT as TB and supports registering cleanup
callbacks which expect a context. These callbacks then run with a context that
comes from ginkgo.DeferCleanup, just as if they had called that directly.
(cherry picked from commit 47b613eded)
This approach with collecting results from callbacks in a main ginkgo.It and
using them as failures in separate ginkgo.It callbacks might be the best that
can be done with Ginkgo.
A better solution is probably Go unit tests with sub-tests.
(cherry picked from commit 65ef31973c)
Refactor fsType() to use a platform-specific formatFsType() helper that
translates filesystem magic numbers to human-readable names. On Linux,
tmpfs filesystems now display as "tmpfs" instead of the raw magic number.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Update scheduler performance test examples to use `-benchtime=1x`
instead of `-benchtime=1ns` for explicitly running each benchmark
exactly once. This makes the intent clearer and aligns the examples
with recommended Go benchmark usage.
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
The test expects unauthorized pods to be blocked from accessing cached
private images, but the default policy (NeverVerifyPreloadedImages)
allows access to any image previously pulled by the kubelet.
Configure the kubelet to use AlwaysVerify policy for this test, which
enforces credential checks for all images regardless of pull history.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
When a container restarts before kubelet restarts, containerMap has
multiple entries (old exited + new running). GetContainerID() may
return the exited container, causing the running check to fail. Fixed
by checking if ANY container for the pod/name is running.
Also filter terminal pods from podresources since they no longer
consume resources, and fix test error handling to avoid exiting
Eventually immediately on transient errors.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Refactoring the DRA upgrade/downgrade testing such that it runs as Go test
depended on supporting ktesting in the E2E framework. That change worked during
presubmit testing, but broke some periodic jobs. Therefore the relevant commits
from https://github.com/kubernetes/kubernetes/pull/135664/commits get reverted:
c47ad64820 DRA e2e+integration: test ResourceSlice controller
047682908d ktesting: replace Begin/End with TContext.Step
de47714879 DRA upgrade/downgrade: rewrite as Go unit test
7c7b1e1018 DRA e2e: make driver deployment possible in Go unit tests
65ef31973c DRA upgrade/downgrade: split out individual test steps
47b613eded e2e framework: support creating TContext
The last one is what must have caused the problem, but the other commits depend
on it.