Commit graph

3300 commits

Author SHA1 Message Date
Kubernetes Prow Robot
da05e3ccc7
Merge pull request #134369 from rbiamru/add-nodeconformance-mirror-pod
test/e2e_node: mark MirrorPod update tests as [NodeConformance]
2025-10-06 11:21:08 -07:00
Kubernetes Prow Robot
f74a7458d9
Merge pull request #134173 from toVersus/test/deflake-sidecar-e2e
deflake e2e: ensure pod with sidecars restarts in correct order after node reboot
2025-10-06 11:21:00 -07:00
HirazawaUi
f9a893be37 Fix incorrect error messages 2025-10-06 00:42:21 +08:00
rbiamru
6e574cabdd test/e2e_node: mark MirrorPod update tests as [NodeConformance] 2025-10-01 21:54:21 -04:00
Ciprian Hacman
2b3f1877be Update NPD to v1.34.0 2025-09-27 19:57:47 +03:00
Tsubasa Nagasawa
bc7ea997a0 deflake e2e: ensure pod with sidecars restarts in correct order after node reboot
Signed-off-by: Tsubasa Nagasawa <toversus2357@gmail.com>
Co-authored-by: Hironori Shiina <shiina.hironori@gmail.com>
2025-09-24 07:36:56 +09:00
Kubernetes Prow Robot
faf249a3f0
Merge pull request #134166 from pohly/dra-e2e-node-registrar-cleanup
DRA E2E node: fix cleanup of tests using separate registrar
2025-09-20 01:58:12 -07:00
Patrick Ohly
343a5db965 DRA E2E node: fix cleanup of tests using separate registrar
8fed05c5b7 fixed the cleanup of tests which start
registrar and service in a single call. But tests which first started the
registrar and then the service separately still had the problem:

- registrar is started with test context
- pods remain running at end of test
- registrar stops because of test context cancellation
- pods remain pending despite deletion because the driver gets
  unregistered (timing dependent, so this may have flaked)

The fix is to also clean up the registrar after the test, in reverse startup
order.
2025-09-19 18:47:21 +02:00
Ed Bartosh
871f87eaec e2e_node: test DRA plugin gRPC connection reuse
Added e2e_node test to verify that the Kubelet establishes only
a single gRPC connection with the DRA plugin for all service calls
during the plugin lifecycle.
The test uses a custom listener to count accepted connections and
asserts that only one connection is used for NodePrepareResources,
NodeUnprepareResources, and NodeWatchResources calls.
2025-09-17 19:15:01 +03:00
Kubernetes Prow Robot
f945dc402b
Merge pull request #134047 from pohly/dra-e2e-node-metric-flake
DRA E2E node: fix test cleanup
2025-09-14 19:08:08 -07:00
Patrick Ohly
8fed05c5b7 DRA E2E node: fix test cleanup
dd9917ddce fixed one test which did not wait for
pods to be deleted and then, depending on the timing, left ResourceClaims
prepared because the driver stopped before kubelet could call
NodeUnprepareResources.

But this is a more systematic issue also with other tests, so now the any test
which starts a DRA plugin automatically uses the same common cleanup code:
- delete pods in the test names
- wait for the driver to not have any active ResourceClaims
- stop the driver
2025-09-12 18:43:35 +02:00
Karthik Bhat
4e907fad15 Explicitly set TerminationGracePeriodSeconds for mirror pod 2025-09-12 15:16:31 +05:30
Kubernetes Prow Robot
2d9ffdfec1
Merge pull request #134006 from pacoxu/kubelet-config-e2e
node_e2e: fix kubelet configuration setup
2025-09-11 20:08:06 -07:00
Paco Xu
455a437674 node_e2e: fix kubelet configuration setup 2025-09-12 09:26:17 +08:00
Kubernetes Prow Robot
7d8551ad4e
Merge pull request #133790 from Jpsassine/fix-leaky-resource-health-test
Fix flaky resource claim metrics test
2025-09-11 04:40:07 -07:00
Kubernetes Prow Robot
147143e348
Merge pull request #133955 from pacoxu/fix-kubelet-e2e
e2e_node kubelet configuration: merge feature gates and system-reserved items
2025-09-10 15:56:18 -07:00
Kubernetes Prow Robot
e9eef19379
Merge pull request #129240 from KevinTMtz/evict-terminated-pods-on-disk-pressure
E2e test for cleaning of terminated containers
2025-09-10 11:47:57 -07:00
John-Paul Sassine
dd9917ddce Fix flaky resource claim metrics test
The E2E node test "[DRA] Two resource Kubelet Plugins [Serial] must provide metrics" was failing flakily due to a race condition.

The preceding test, "should not add health status to Pod when feature gate is disabled," was leaking an in-use ResourceClaim. It deleted its pod but did not wait for the Kubelet to finish unprepared the resources, leaving the `dra_resource_claims_in_use` metric at a non-zero value.

This commit makes the cleanup process synchronous so now it  deletes the pod and explicitly waits for the `NodeUnprepareResources` gRPC call to complete making sure resources are released before the test finishes.

Additionally, I fixed  the cleanup logic in the `createHealthTestPodAndClaim` helper function to prevent a `DeviceClass` leak.
2025-09-10 18:21:35 +00:00
Paco Xu
1e3c3934cb e2e_node kubelet configuration: merge feature gates and system-reserved items 2025-09-09 15:19:36 +08:00
Kubernetes Prow Robot
90b03f1af0
Merge pull request #133910 from bitoku/fix-graceful-shutdown
Fix GracefulNodeShutdown perma failing test
2025-09-08 07:39:38 -07:00
Ayato Tokubi
5ed98e97e1 Remove getLocalNode to fix GracefulNodeShutdown e2e.
getLocalNode tried to get a ready node and fails if there's none.
The e2e test sends termination signal to kubelet and it's expected to have no ready nodes. Because of this, the e2e was permafailing.

Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
2025-09-08 12:20:55 +00:00
Kubernetes Prow Robot
b508767369
Merge pull request #132655 from ylink-lfs/ci/httpd_removal
ci: remove httpd usage while using agnhost instead
2025-09-05 20:23:24 -07:00
Kubernetes Prow Robot
ef4add4509
Merge pull request #133356 from mayuka-c/issue-133175
Replace usage of deprecated ErrWaitTimeout with recommended method across all Pkgs
2025-09-05 06:43:34 -07:00
Kevin Torres
86e3ad233f Revert trapping TERM for podWithCommand 2025-09-04 18:46:54 +00:00
Kubernetes Prow Robot
4a79948217
Merge pull request #133473 from roycaihw/psi-cpu-pressure-test
PSI test: add a CPU limit of 500m to cpu-stress-pod
2025-09-03 14:03:15 -07:00
Kubernetes Prow Robot
5dff07fdf9
Merge pull request #133837 from saschagrunert/cni-plugins
Update CNI plugins to v1.8.0
2025-09-03 07:53:15 -07:00
Kubernetes Prow Robot
53fecc7748
Merge pull request #133720 from carlory/cleanup-SizeMemoryBackedVolumes
Drop SizeMemoryBackedVolumes after the feature GA-ed in 1.32
2025-09-02 15:33:13 -07:00
Sascha Grunert
f0be916f7a
Update CNI plugins to v1.8.0
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2025-09-02 08:39:57 +02:00
ylink-lfs
1fd7f308fc ci: remove httpd usage while using agnhost instead 2025-09-01 20:11:18 +08:00
Kubernetes Prow Robot
4f1ac4f7ac
Merge pull request #133659 from kannon92/fix-pod-resources-api
Fix panic in PodResources API test when FeatureGates is nil
2025-09-01 03:27:12 -07:00
Patrick Ohly
70cd76c5cf DRA e2e node: skipping resource health disabled test
`framework.WithLabel("[FeatureGate:ResourceHealthStatus:Disabled]")` has no
effect unless a job explicitly uses it in a --label-filter, which is not what
"generic" alpha/beta jobs are meant to do. The test therefore ran in the new
dra-alpha-beta job and failed because it expected the feature to be off.

In addition, the square brackets got added twice (once via the string
parameter, once by `framework.WithLabel`).

There is no generic way to filter out tests in advance which depend on feature
gates to be turned off. In e2e_node tests the active feature gates can be
checked at runtime, so this is what the test now does.
2025-09-01 08:44:39 +02:00
carlory
36cf728281
Drop SizeMemoryBackedVolumes after the feature GA-ed in 1.32
Signed-off-by: carlory <baofa.fan@daocloud.io>
2025-08-29 14:01:39 +08:00
Francesco Romani
bf6a55cd06 e2e: node: address linter errors
remove now-unused code

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-08-28 13:24:02 +02:00
Francesco Romani
9aed0813e6 e2e: node: cpumgr: replace old testsuite
This final change in the series completes the transition
to the new test suite

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-08-28 12:02:27 +02:00
Francesco Romani
c4f7272f62 e2e: node: cpumgr: keep only scaffolding
keep only the test stub, as all the code was already
removed by PR in the series because superseded by
code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-08-28 11:51:40 +02:00
Francesco Romani
37d678e098 e2e: node: cpumgr: remove old sidecar container tests
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-08-28 11:51:40 +02:00
Francesco Romani
666dec8c2f e2e: node: cpumgr: remove old reserved cpus tests
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-08-28 11:51:40 +02:00
Francesco Romani
d98069e22c e2e: node: cpumgr: remove old distribute-cpus tests
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-08-28 11:51:40 +02:00
Francesco Romani
e2624d0cce e2e: node: cpumgr: remove old smt alignment tests
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-08-28 11:51:40 +02:00
Francesco Romani
9e6073304f e2e: node: cpumgr: remove old cfs quota tests
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-08-28 11:51:40 +02:00
Kubernetes Prow Robot
951c286433
Merge pull request #133462 from ffromani/e2e-node-cpumgr-cleanup-prepare
e2e: node: cpumanager: cleanup and tidification before test replacement
2025-08-27 18:29:46 -07:00
Kubernetes Prow Robot
3893eca5ce
Merge pull request #133432 from Jpsassine/kep4680-conform-device-plugin-feature
test: Standardize labels for ResourceHealthStatus e2e tests
2025-08-27 18:29:17 -07:00
Kubernetes Prow Robot
ea0b7f79ea
Merge pull request #133251 from HirazawaUi/fix-sidecar-container-tests
Fix sidecar containers flaky tests
2025-08-27 16:06:11 -07:00
Kubernetes Prow Robot
0cecf14cfe
Merge pull request #133206 from gavinkflam/131475-gocritic-issues-test-e2e-node
Fix gocritic issues - test/e2e_node
2025-08-27 16:05:56 -07:00
Kevin Torres
7cf39066b3 Remove sleepAfterExecuting param from diskConsumingPod 2025-08-27 18:24:18 +00:00
Kevin Torres
388046c3ea ImageGCTerminatedPodsContainersCleanup e2e node test
Updated ImageGCTerminatedPodsEviction to ImageGCTerminatedPodsContainersCleanup to test that
terminated containers are being cleaned up, instead of testing if terminated pods were being evicted
2025-08-27 18:24:11 +00:00
Kevin Torres
c9ccbae0d9 Remove terminated pods eviction code 2025-08-27 18:07:26 +00:00
Kevin Torres
2cad51f6c0 Add ImageGCTerminatedPodsEviction e2e node test
Removed TerminatedPodsEvictionOnDiskPressure, since it did not test the expected functionality

gofmt ./pkg/kubelet/eviction/eviction_manager_test.go
2025-08-27 18:03:21 +00:00
Kevin Torres
a59ce54d79 TerminatedPodsEvictionOnDiskPressure e2e node test 2025-08-27 18:01:56 +00:00
Kevin Hannon
4a597f50b4 Fix panic in PodResources API test when FeatureGates is nil
The test was panicking when trying to assign to a nil map in
initialConfig.FeatureGates["KubeletPodResourcesListUseActivePods"] = false.
Added nil check and map initialization to match the pattern used
elsewhere in the same file.

Fixes panic: internal/runtime/maps/runtime_faststr_swiss.go:265

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-22 10:37:47 -04:00