Test_isSchedulableAfterClaimChange was sensitive to system load because of the
arbitrary delay when waiting for the assume cache to catch up. Running inside
a synctest bubble avoids this. While at it, the unit tests get converted
to ktesting (nicer failure output, no extra indention needed for
tCtx.SyncTest).
TestPlugin/prebind-fail-with-binding-timeout relied on setting up a claim with
certain time stamps and then getting that test case tested within a certain
real-world time window. It's surprising that this didn't flake more often
because test execution order is random. Now the time stamp gets set right
before the test case is about to be tested. Conversion to a synctest would
be nicer, but synctests cannot have sub-tests, which are used here to track
where log output and failures come from within the larger test case.
Inside the plugin itself some log output gets added to explain why a claim is
unavailable on a node in case of a binding timeout or error during Filter.
The test started without waiting for the ResourceSlice informer to have
synced. As a result, the "CEL-runtime-error-for-one-of-three-nodes" test case
failed randomly with a very low flake rate (less than 1% in local runs) because
CEL expressions never got evaluated due to not having the slices (yet).
Other tests also were less reliable, but not known to fail.
Support for DeviceTaintRules depends on a significant amount of
additional code:
- ResourceSlice tracker is a NOP without it.
- Additional informers and corresponding permissions in scheduler and controller.
- Controller code for handling status.
Not all users necessarily need DeviceTaintRules, so adding a second feature
gate for that code makes it possible to limit the blast radius of bugs in that
code without having to turn off device taints and tolerations entirely.
Add a new `bindingTimeout` field to DynamicResources plugin args and wire it
into PreBind.
Changes:
- API: add `bindingTimeout` to DynamicResourcesArgs (staging + internal types).
- Defaults: default to 600 seconds when BOTH DRADeviceBindingConditions and
DRAResourceClaimDeviceStatus are enabled.
- Validation: require >= 1s; forbid when either feature gate is disabled.
- Plugin: plumbs args into `pl.bindingTimeout` and uses it in
`wait.PollUntilContextTimeout` for binding-condition wait logic.
- Plugin: remove legacy `BindingTimeoutDefaultSeconds`.
Tests:
- Add/adjust unit tests for validation and PreBind timeout path.
- Ensure <1s and negative values are rejected; forbids when gates disabled.
class mapping
- Add a new interface "DeviceClassResolver" in the scheduler framework
- Add a global cache of mapping between the extended resource and the
device class
- Cache can be leveraged by the k8s api-server, controller-manager along with the scheduler
- This change helps in delegating the requests to the dynamicresource
plugin based on the mapping during the node update events and thus
avoiding an extra scheduling cycle
Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
Previously, the scheduler assumed an extended resource was maintained
by a device plugin if its name was present in the node's Allocatable
map, even if its value was zero. This blocked scheduling when a device
plugin was disconnected or uninstalled, because Kubelet still reported
the resource with Allocatable=0.
This change adds a check for the actual allocatable value in addition
to a key presence check, allowing nodes with uninstalled device
plugins to be considered for scheduling.
This prevents the mistake from 1.34 where the default-on
DRAResourceClaimDeviceStatus feature caused the use of the experimental
allocator implementation. The test fails without a fix for that.
TestPlugin/multi-claims-binding-conditions-all-success/PreEnqueue
flakes due to the assumed cache not been synced with the initial
store. The test waits until the registered handler used by the
assumed cache has synced before proceeding with the test
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.
However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
Added a skipOnWindows flag to DynamicResources scheduler test case
to skip test that relies on nanosecond timer precision.
Windows timer granularity is much coarser than Linux, which causes
the test to fail often.
The intent is to catch abnormal runtimes with the generously large default
timeout of 10 seconds.
We have to set up a context with the configured timeout (optional!), then
ensure that both CEL evaluation and the allocation logic itself properly
returns the context error. The scheduler plugin then can convert that into
"unschedulable".
The allocator and thus Filter now also check for context cancellation by the
scheduler. This happens when enough nodes have been found.
* Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes
apply review comment and fix linter warning
* update-vendor.sh
* update doc comments
* run update-vendor.sh
Thanks to the tracker, the plugin sees all taints directly in the device
definition and can compare it against the tolerations of a request while
trying to find a device for the request.
When the feature is turnedd off, taints are ignored during scheduling.
The controller is derived from the node taint eviction controller.
In contrast to that controller it tracks the UID of pods to prevent
deleting the wrong pod when it got replaced.
If there was an unexpected status, the code extracting the expected error
message crashed with a panic. Happened once so far, for unknown reasons
because the unexpected status then didn't get logged.