8fed05c5b7 fixed the cleanup of tests which start
registrar and service in a single call. But tests which first started the
registrar and then the service separately still had the problem:
- registrar is started with test context
- pods remain running at end of test
- registrar stops because of test context cancellation
- pods remain pending despite deletion because the driver gets
unregistered (timing dependent, so this may have flaked)
The fix is to also clean up the registrar after the test, in reverse startup
order.
Added e2e_node test to verify that the Kubelet establishes only
a single gRPC connection with the DRA plugin for all service calls
during the plugin lifecycle.
The test uses a custom listener to count accepted connections and
asserts that only one connection is used for NodePrepareResources,
NodeUnprepareResources, and NodeWatchResources calls.
dd9917ddce fixed one test which did not wait for
pods to be deleted and then, depending on the timing, left ResourceClaims
prepared because the driver stopped before kubelet could call
NodeUnprepareResources.
But this is a more systematic issue also with other tests, so now the any test
which starts a DRA plugin automatically uses the same common cleanup code:
- delete pods in the test names
- wait for the driver to not have any active ResourceClaims
- stop the driver
The E2E node test "[DRA] Two resource Kubelet Plugins [Serial] must provide metrics" was failing flakily due to a race condition.
The preceding test, "should not add health status to Pod when feature gate is disabled," was leaking an in-use ResourceClaim. It deleted its pod but did not wait for the Kubelet to finish unprepared the resources, leaving the `dra_resource_claims_in_use` metric at a non-zero value.
This commit makes the cleanup process synchronous so now it deletes the pod and explicitly waits for the `NodeUnprepareResources` gRPC call to complete making sure resources are released before the test finishes.
Additionally, I fixed the cleanup logic in the `createHealthTestPodAndClaim` helper function to prevent a `DeviceClass` leak.
getLocalNode tried to get a ready node and fails if there's none.
The e2e test sends termination signal to kubelet and it's expected to have no ready nodes. Because of this, the e2e was permafailing.
Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
`framework.WithLabel("[FeatureGate:ResourceHealthStatus:Disabled]")` has no
effect unless a job explicitly uses it in a --label-filter, which is not what
"generic" alpha/beta jobs are meant to do. The test therefore ran in the new
dra-alpha-beta job and failed because it expected the feature to be off.
In addition, the square brackets got added twice (once via the string
parameter, once by `framework.WithLabel`).
There is no generic way to filter out tests in advance which depend on feature
gates to be turned off. In e2e_node tests the active feature gates can be
checked at runtime, so this is what the test now does.
keep only the test stub, as all the code was already
removed by PR in the series because superseded by
code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
superseded by code in cpumanager_test.go,
which will be moved in cpu_manager_test.go at the
end of this series.
Split to make the review easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Updated ImageGCTerminatedPodsEviction to ImageGCTerminatedPodsContainersCleanup to test that
terminated containers are being cleaned up, instead of testing if terminated pods were being evicted
The test was panicking when trying to assign to a nil map in
initialConfig.FeatureGates["KubeletPodResourcesListUseActivePods"] = false.
Added nil check and map initialization to match the pattern used
elsewhere in the same file.
Fixes panic: internal/runtime/maps/runtime_faststr_swiss.go:265
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>