Introduce a mock net.Listener for tests that triggers a controlled
error on Close, enabling reliable simulation of gRPC server failures
in test scenarios.
Refactor StartPlugin and related test helpers to accept a variadic
list of options of any type, allowing both public and test-specific
options to be passed.
Refactor the DRA e2e_node test helpers and test cases to accept
variadic kubeletplugin.Option arguments.
This change improves test flexibility and maintainability, allowing
new options to be passed in the future without requiring widespread
code changes.
There are no functional changes to test coverage or behavior.
We want to fix and enhance lanes which exercise
the podresources API tests. The first step is to clarify
the label and made it specific to podresources API,
minimzing the clash and the ambiguity with the "PodLevelResources"
feature.
Note we change the label names, but the label name is backward
compatible (filtering for "Feature:PodResources" will still
get the tests). This turns out to be not a problem because
these tests are no longer called out explicitly in the lane
definitions. We want to change this ASAP.
The new name is more specific and allows us to clearly
call out tests for this feature in the lane definitions.
Signed-off-by: Francesco Romani <fromani@redhat.com>
add more e2e tests to cover the interaction with
core resource managers (cpu, memory) and to ensure
proper reporting.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Since the KEP 4885
(https://github.com/kubernetes/enhancements/blob/master/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md)
memory manager is supported also on windows.
Plus, we want to add podresources e2e tests which configure
the memory manager. Both these facts suggest it's useful to build
the e2e memory manager tests on all OSes, not just on linux;
However, since we are not sure we are ready to run these tests
everywhere, we tag them LinuxOnly to keep preserve most of the
old behavior.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Whenever swap is provisioned on the node,
the kernel might be able to reclaim much more memory,
hence it is harder to get the node to be memory pressured.
This will add another container that allocates
the same amount as the swap capacity to help
bring the node to memory pressure.
Signed-off-by: Itamar Holder <iholder@redhat.com>
This small refactor:
- Adds swap log statistics.
- Adds a pre pods modification function.
The later can be used in order to perform
changes to pods before creation.
Signed-off-by: Itamar Holder <iholder@redhat.com>
The new metric informs admins whether DRA in general (special "driver_name: <any>"
label) and/or specific DRA drivers (other label values) are in use on nodes.
This is useful to know because removing a driver is only safe if it is not in
use. If a driver gets removed while it has prepared a ResourceClaim,
unpreparing that ResourceClaim and stopping pods is blocked.
The implementation of the metric uses read locking of the claim
info cache. It retrieves "claims in use" and turns those into the metric.
The same code is also used to log changes in the claim info cache with
a diff. This hooks into a write update of the claim info cache and uses
contextual logging.
The unit tests check that metrics get calculated. The e2e_node test checks that
kubelet really exports the metrics data.
While at it, some bugs in the claiminfo_test.go get fixed: the way how the
cache got populated in the test did not match the code anymore.
This should fix the following test when running it with CRI-O:
```
[It] [sig-node] [Feature:SidecarContainers] [Serial] Containers
Lifecycle when A node running restartable init containers reboots should
restart the containers in right order with the proper phase after the
node reboot
```
The issue is that we have prefixed "unable to retrieve container logs
for …" outputs in the message to be parsed. We now skip that part and
leave the current behavior untouched.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The `ginkgo.ContinueOnFailure` decorator serves the usecase
of the new cpumanager tests perfectly:
https://onsi.github.io/ginkgo/#failure-handling-in-ordered-containers
"""
You can override this behavior by decorating an Ordered container with
ContinueOnFailure. This is useful in cases where Ordered is being used
to provide shared expensive set up for a collection of specs.
When ContinueOnFailure is set, Ginkgo will continue running specs even
if an earlier spec in the Ordered container has failed.
"""
And this is exactly the case at hand. Previously, without this
decorator, subsequent failures were masked, which is dangerous and not
what we want.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Initially we added minimal quota disablement e2e tests,
but since the emergence of https://github.com/kubevirt/kubevirt/issues/14965
it becames clear that is better to have full coverage.
This PR restores coverage parity with the old test suite.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Added tests to verify DRA functionality with 2 different socket
configurations:
- the same socket is used for the registration and the DRA service
- 2 separate sockets are used for the registration and the DRA service
Used table-driven ginkgo to avoid code duplication:
specs https://onsi.github.io/ginkgo/#table-driven-tests
This change enhances the robustness of the DRA e2e tests by
validating its behavior with different socket setups.
Added an ability to specify the socket path for the DRA gRPC
service in the e2e node tests.
The PluginSocket option is added to allow setting the name
of the socket inside the directory where the DRA driver
creates the socket for the DRA gRPC calls. This is used by
the kubelet to connect to the DRA plugin.
The newDRAService and newRegistrar functions are updated to
accept a socketPath parameter, which is used to configure
the PluginDataDirectoryPath and PluginSocket options for the
DRA plugin.
This change enables more flexible configuration of the DRA
plugin in e2e tests, allowing for testing with different
socket paths.
Fixed the following warnings:
dra_test.go:884:2: singleCaseSwitch: should rewrite switch statement to if statement (gocritic)
switch podName {
^
dra_test.go:686:4: SA4006: this value of kubeletPlugin is never used (staticcheck)
kubeletPlugin = newDRAService(ctx, f.ClientSet, nodeName, driverName)
^
This ensures that ResourceSlices get removed also when a plugin becomes
unresponsive without removing the registration socket.
Tests are from https://github.com/kubernetes/kubernetes/pull/131073 by Ed
with some modifications, the implementation is new.
The rest of the system logs information using "driverName" as key in structured
logging. The kubelet should do the same.
This also gets clarified in the code, together with using consistent a
consistent name for a Plugin pointer: "plugin" instead of "client" or
"instance".
The New in NewDRAPluginClient made no sense because it's not constructing
anything, and it returns a plugin, not a client -> GetDRAPlugin.
when a test is verifying a container has restarted, we use a continually exiting
container. Not verifying the number of restarts is less than (rather than equal) introduces
a race between the container restarting and the status observation.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
in general, the rewritten e2e cpumanager test assume cgroup v2.
A limited set of these may be updated to work also with the
obsolete and declining cgroup v1, but these need to be reviewed
on test-by-test matter.
To fix test failures, we add a top level require for cgroup v2,
skipping otherwise. This will fix the red lanes while we review
the testcases and the deprecation plan of the other tests.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The package is unmaintained, and the tests don't rely on the
functionality it provides on top of Golang errors (stack traces).
Signed-off-by: Stephen Kitt <skitt@redhat.com>