Commit graph

3185 commits

Author SHA1 Message Date
Kubernetes Prow Robot
8f312e6fbf
Merge pull request #132348 from iholder101/swap/add-container-swap-limit-metric
[KEP-2400] Add a container_swap_limit_bytes metric
2025-07-16 20:02:30 -07:00
Kubernetes Prow Robot
9f545c5b46
Merge pull request #130992 from dshebib/addRegularContainerImageChangeToE2E_reverted
E2E Node Tests: Remove failing test from reverted PR
2025-07-16 20:02:23 -07:00
Ed Bartosh
e4320fe25c e2e_node: DRA: test handling fatal serving failures
Added an e2e_node test to verify that the DRA plugin and
registration services cancel provided context when handling
fatal gRPC serving errors.
2025-07-16 15:49:41 +03:00
Ed Bartosh
ea05ad8887 e2e_node: DRA: add errorOnCloseListener
Introduce a mock net.Listener for tests that triggers a controlled
error on Close, enabling reliable simulation of gRPC server failures
in test scenarios.
2025-07-16 15:49:41 +03:00
Ed Bartosh
1981c985b1 e2e: DRA: support test and public options
Refactor StartPlugin and related test helpers to accept a variadic
list of options of any type, allowing both public and test-specific
options to be passed.
2025-07-16 15:49:41 +03:00
Ed Bartosh
169965350c e2e_node: Refactor DRA tests to use variadic options
Refactor the DRA e2e_node test helpers and test cases to accept
variadic kubeletplugin.Option arguments.

This change improves test flexibility and maintainability, allowing
new options to be passed in the future without requiring widespread
code changes.

There are no functional changes to test coverage or behavior.
2025-07-16 15:42:12 +03:00
Kubernetes Prow Robot
20344f9aba
Merge pull request #132345 from ffromani/e2e-podresourcesapi-labels
e2e: node: fix podresources API feature label
2025-07-15 13:16:29 -07:00
Kubernetes Prow Robot
394f412767
Merge pull request #132617 from aramase/aramase/f/kep_4412_pod_cache_key_type
Add ServiceAccountTokenCacheType support to credential provider plugin
2025-07-15 10:56:45 -07:00
Francesco Romani
05e1c4b489 e2e: node: fix podresources API feature label
We want to fix and enhance lanes which exercise
the podresources API tests. The first step is to clarify
the label and made it specific to podresources API,
minimzing the clash and the ambiguity with the "PodLevelResources"
feature.

Note we change the label names, but the label name is backward
compatible (filtering for "Feature:PodResources" will still
get the tests). This turns out to be not a problem because
these tests are no longer called out explicitly in the lane
definitions. We want to change this ASAP.

The new name is more specific and allows us to clearly
call out tests for this feature in the lane definitions.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-07-15 14:15:00 +02:00
Kubernetes Prow Robot
bf0be9fb56
Merge pull request #132028 from ffromani/podresources-list-active-pods
podresources: list: use active pods
2025-07-14 12:06:24 -07:00
Charles Wong
98c4514eae add e2e_node tests for uncore alignment 2025-07-11 10:32:01 -05:00
Anish Ramasekar
4d2566eb5a
credentialprovider: wire in service account mode cache type
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2025-07-10 14:50:54 -05:00
Francesco Romani
8f92a81787 node: e2e: podresources: add more e2e tests
add more e2e tests to cover the interaction with
core resource managers (cpu, memory) and to ensure
proper reporting.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-07-08 17:18:34 +02:00
Francesco Romani
380ed8d9b3 e2e: node: memory manager: build everywhere, run only on linux
Since the KEP 4885
(https://github.com/kubernetes/enhancements/blob/master/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md)
memory manager is supported also on windows.

Plus, we want to add podresources e2e tests which configure
the memory manager. Both these facts suggest it's useful to build
the e2e memory manager tests on all OSes, not just on linux;

However, since we are not sure we are ready to run these tests
everywhere, we tag them LinuxOnly to keep preserve most of the
old behavior.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-07-08 17:18:34 +02:00
Itamar Holder
25d9d8d9ba refactor: use getLocalNode() to avoid code duplication
Signed-off-by: Itamar Holder <iholder@redhat.com>
2025-07-08 15:48:35 +03:00
Itamar Holder
bc9e8e1a91 add a context argument to prePodCreationModificationFunc()
Signed-off-by: Itamar Holder <iholder@redhat.com>
2025-07-08 15:45:42 +03:00
Itamar Holder
1ac60e35e9 e2e test: Add a container_swap_limit_bytes metric
Signed-off-by: Itamar Holder <iholder@redhat.com>
2025-07-08 12:38:18 +03:00
Kubernetes Prow Robot
09d99b7990
Merge pull request #132672 from iholder101/test/swap-delme-mod
Stabilize swap eviction priority test
2025-07-08 00:35:28 -07:00
Kubernetes Prow Robot
ee012e883f
Merge pull request #131641 from pohly/dra-kubelet-in-use-metric
DRA kubelet: add dra_resource_claims_in_use gauge vector
2025-07-07 03:11:26 -07:00
PatrickLaabs
0e8424fcf0 chore: depr. pointer pkg replacement for the e2e_node 2025-07-06 11:27:16 +02:00
Sascha Grunert
b464bbeb8f
Remove gogo-protobuf from CRI
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2025-07-04 08:55:57 +02:00
Itamar Holder
90bbce56b9 PriorityMemoryEvictionOrdering: allocate more memory when swap is provisioned
Whenever swap is provisioned on the node,
the kernel might be able to reclaim much more memory,
hence it is harder to get the node to be memory pressured.

This will add another container that allocates
the same amount as the swap capacity to help
bring the node to memory pressure.

Signed-off-by: Itamar Holder <iholder@redhat.com>
2025-07-03 21:14:44 +03:00
Itamar Holder
25498cd34d Eviction tests: small refactor
This small refactor:
- Adds swap log statistics.
- Adds a pre pods modification function.

The later can be used in order to perform
changes to pods before creation.

Signed-off-by: Itamar Holder <iholder@redhat.com>
2025-07-03 21:14:43 +03:00
Daniel Shebib
998776d80b remove breaking test 2025-07-01 23:06:05 -05:00
Kubernetes Prow Robot
b99ca3f736
Merge pull request #132498 from ffromani/e2e-serial-node-cpumanager-fix-ordered
e2e: serial: node cpumanager parity with the old suite
2025-07-01 07:15:31 -07:00
Patrick Ohly
6d6a749c62 DRA kubelet: add dra_resource_claims_in_use gauge vector
The new metric informs admins whether DRA in general (special "driver_name: <any>"
label) and/or specific DRA drivers (other label values) are in use on nodes.
This is useful to know because removing a driver is only safe if it is not in
use. If a driver gets removed while it has prepared a ResourceClaim,
unpreparing that ResourceClaim and stopping pods is blocked.

The implementation of the metric uses read locking of the claim
info cache. It retrieves "claims in use" and turns those into the metric.

The same code is also used to log changes in the claim info cache with
a diff. This hooks into a write update of the claim info cache and uses
contextual logging.

The unit tests check that metrics get calculated. The e2e_node test checks that
kubelet really exports the metrics data.

While at it, some bugs in the claiminfo_test.go get fixed: the way how the
cache got populated in the test did not match the code anymore.
2025-06-26 14:31:03 +02:00
Kubernetes Prow Robot
dcefe0ef41
Merge pull request #132058 from pohly/dra-kubelet-connection-monitoring
DRA kubelet: connection monitoring
2025-06-26 03:40:29 -07:00
Kubernetes Prow Robot
1e59323e60
Merge pull request #132065 from yuanwang04/SwapMetrics
Fix pod and container level swap metrics for CRI
2025-06-25 16:22:28 -07:00
Sascha Grunert
0028ea8e99
Improve containers lifecycle test output parsing
This should fix the following test when running it with CRI-O:

```
[It] [sig-node] [Feature:SidecarContainers] [Serial] Containers
Lifecycle when A node running restartable init containers reboots should
restart the containers in right order with the proper phase after the
node reboot
```

The issue is that we have prefixed "unable to retrieve container logs
for …" outputs in the message to be parsed. We now skip that part and
leave the current behavior untouched.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2025-06-25 08:51:29 +02:00
Francesco Romani
3b0fd32810 e2e: serial: cpumanager: continue on failure
The `ginkgo.ContinueOnFailure` decorator serves the usecase
of the new cpumanager tests perfectly:

https://onsi.github.io/ginkgo/#failure-handling-in-ordered-containers

"""
You can override this behavior by decorating an Ordered container with
ContinueOnFailure. This is useful in cases where Ordered is being used
to provide shared expensive set up for a collection of specs.
When ContinueOnFailure is set, Ginkgo will continue running specs even
if an earlier spec in the Ordered container has failed.
"""

And this is exactly the case at hand. Previously, without this
decorator, subsequent failures were masked, which is dangerous and not
what we want.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-06-24 15:46:06 +02:00
Francesco Romani
f76e1381d0 e2e: node: fix quota disablement testcases
Initially we added minimal quota disablement e2e tests,
but since the emergence of https://github.com/kubevirt/kubevirt/issues/14965
it becames clear that is better to have full coverage.

This PR restores coverage parity with the old test suite.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-06-24 15:46:01 +02:00
Ed Bartosh
cf544da6f7 e2e_node: DRA: add tests for different socket setups
Added tests to verify DRA functionality with 2 different socket
configurations:
- the same socket is used for the registration and the DRA service
- 2 separate sockets are used for the registration and the DRA service

Used table-driven ginkgo to avoid code duplication:
specs https://onsi.github.io/ginkgo/#table-driven-tests

This change enhances the robustness of the DRA e2e tests by
validating its behavior with different socket setups.
2025-06-24 10:42:45 +02:00
Ed Bartosh
7f6389e770 e2e_node: DRA: pass socket path as a parameter
Added an ability to specify the socket path for the DRA gRPC
service in the e2e node tests.

The PluginSocket option is added to allow setting the name
of the socket inside the directory where the DRA driver
creates the socket for the DRA gRPC calls. This is used by
the kubelet to connect to the DRA plugin.

The newDRAService and newRegistrar functions are updated to
accept a socketPath parameter, which is used to configure
the PluginDataDirectoryPath and PluginSocket options for the
DRA plugin.

This change enables more flexible configuration of the DRA
plugin in e2e tests, allowing for testing with different
socket paths.
2025-06-24 10:42:45 +02:00
Ed Bartosh
c90c2e0d40 kubelet: DRA: fix linter warnings
Fixed the following warnings:
dra_test.go:884:2: singleCaseSwitch: should rewrite switch statement to if statement (gocritic)
	switch podName {
	^
dra_test.go:686:4: SA4006: this value of kubeletPlugin is never used (staticcheck)
	kubeletPlugin = newDRAService(ctx, f.ClientSet, nodeName, driverName)
        ^
2025-06-24 10:42:45 +02:00
Ed Bartosh
4ee7374b24 DRA kubelet: add connection monitoring
This ensures that ResourceSlices get removed also when a plugin becomes
unresponsive without removing the registration socket.

Tests are from https://github.com/kubernetes/kubernetes/pull/131073 by Ed
with some modifications, the implementation is new.
2025-06-24 10:42:41 +02:00
Yuan Wang
c5f061e0df Fix pod and container level swap metrics for CRI 2025-06-23 17:57:12 +00:00
Kubernetes Prow Robot
54291a55c2
Merge pull request #132096 from pohly/dra-kubelet-refactoring
DRA kubelet: refactoring
2025-06-13 04:45:09 -07:00
Kubernetes Prow Robot
8afdc5583f
Merge pull request #132215 from ffromani/e2e-serial-cpumgr-crio-fix
e2e: node: serial: fix cgroup path with crio
2025-06-11 04:04:57 -07:00
Francesco Romani
b39741b506 e2e: node: serial: fix cgroup path with crio
the path construction with crio is wrong (typo).

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-06-10 19:25:48 +02:00
Patrick Ohly
494a129d02 DRA kubelet: clarify plugin vs, driver name
The rest of the system logs information using "driverName" as key in structured
logging. The kubelet should do the same.

This also gets clarified in the code, together with using consistent a
consistent name for a Plugin pointer: "plugin" instead of "client" or
"instance".

The New in NewDRAPluginClient made no sense because it's not constructing
anything, and it returns a plugin, not a client -> GetDRAPlugin.
2025-06-06 18:24:33 +02:00
Kubernetes Prow Robot
8bcc78c7bf
Merge pull request #132067 from bzsuni/bz/npd/update/0.8.21
Update npd from v0.8.20 to v0.8.21
2025-06-05 11:58:38 -07:00
Kubernetes Prow Robot
6188e5cb7b
Merge pull request #132101 from haircommander/restart-flake
e2e_node: verify restart looping container correctly
2025-06-04 13:40:50 -07:00
Kubernetes Prow Robot
6eaef7b0d6
Merge pull request #131969 from skitt/test-e2e-pkg-errors
test: drop dependency on github.com/pkg/errors
2025-06-04 12:16:38 -07:00
Peter Hunt
daae472fe1 e2e_node: verify restart looping container correctly
when a test is verifying a container has restarted, we use a continually exiting
container. Not verifying the number of restarts is less than (rather than equal) introduces
a race between the container restarting and the status observation.

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2025-06-04 13:27:50 -04:00
Kubernetes Prow Robot
1c56fff49b
Merge pull request #132077 from ffromani/e2e-node-cgroup-v2-only
e2e: node: cpumanager: require cgroup v2
2025-06-03 12:10:46 -07:00
Kubernetes Prow Robot
9819f760f0
Merge pull request #131991 from SergeyKanzhelev/clarifyTheTokenScope
Clarified the token scope and future plans for the next security scan…
2025-06-03 10:02:38 -07:00
Francesco Romani
7e7aa6d810 e2e: node: cpumanager: require cgroup v2
in general, the rewritten e2e cpumanager test assume cgroup v2.
A limited set of these may be updated to work also with the
obsolete and declining cgroup v1, but these need to be reviewed
on test-by-test matter.

To fix test failures, we add a top level require for cgroup v2,
skipping otherwise. This will fix the red lanes while we review
the testcases and the deprecation plan of the other tests.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-06-03 18:22:48 +02:00
bzsuni
b9d9dea03f Update npd from v0.8.20 to v0.8.21
Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>
2025-06-03 16:08:29 +08:00
Sergey Kanzhelev
a512de6e09 Clarified the token scope and future plans for the next security scan to refer to it 2025-06-02 16:53:10 +00:00
Stephen Kitt
545fbc99c2
test: drop dependency on github.com/pkg/errors
The package is unmaintained, and the tests don't rely on the
functionality it provides on top of Golang errors (stack traces).

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2025-06-02 11:27:09 +02:00