Implement the RPSR controller that watches ResourcePoolStatusRequest
objects and aggregates pool status from DRA drivers. Add the API server
registry (strategy, storage), handwritten validation, RBAC bootstrap
policy for the controller, kube-controller-manager wiring, table
printer columns, and storage factory registration.
The fields become beta, enabled by default. DeviceTaintRule gets
added to the v1beta2 API, but support for it must remain off by default
because that API group is also off by default.
The v1beta1 API is left unchanged. No-one should be using it
anymore (deprecated in 1.33, could be removed now if it wasn't for
reading old objects and version emulation).
To achieve consistent validation, declarative validation must be enabled also
for v1alpha3 (was already enabled for other versions). Otherwise,
TestVersionedValidationByFuzzing fails:
--- FAIL: TestVersionedValidationByFuzzing (0.09s)
--- FAIL: TestVersionedValidationByFuzzing/resource.k8s.io/v1beta2,_Kind=DeviceTaintRule (0.00s)
validation_test.go:109: different error count (0 vs. 1)
resource.k8s.io/v1alpha3: <no errors>
resource.k8s.io/v1beta2: "spec.taint.effect: Unsupported value: \"幤HxÒQP¹¬永唂ȳ垞ş]嘨鶊\": supported values: \"NoExecute\", \"NoSchedule\", \"None\""
...
* Drop WorkloadRef field and introduce SchedulingGroup field in Pod API
* Introduce v1alpha2 Workload and PodGroup APIs, drop v1alpha1 Workload API
Co-authored-by: yongruilin <yongrlin@outlook.com>
* Run hack/update-codegen.sh
* Adjust kube-scheduler code and integration tests to v1alpha2 API
* Drop v1alpha1 scheduling API group and run make update
---------
Co-authored-by: yongruilin <yongrlin@outlook.com>
Replace all imports of k8s.io/apimachinery/pkg/util/dump with
k8s.io/utils/dump across the repo. The apimachinery dump package
now contains deprecated wrapper functions that delegate to
k8s.io/utils/dump for backwards compatibility.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Support for DeviceTaintRules depends on a significant amount of
additional code:
- ResourceSlice tracker is a NOP without it.
- Additional informers and corresponding permissions in scheduler and controller.
- Controller code for handling status.
Not all users necessarily need DeviceTaintRules, so adding a second feature
gate for that code makes it possible to limit the blast radius of bugs in that
code without having to turn off device taints and tolerations entirely.
To update the right statuses, the controller must collect more information
about why a pod is being evicted. Updating the DeviceTaintRule statuses then is
handled by the same work queue as evicting pods.
Both operations already share the same client instance and thus QPS+server-side
throttling, so they might as well share the same work queue. Deleting pods is
not necessarily more important than informing users or vice-versa, so there is
no strong argument for having different queues.
While at it, switching the unit tests to usage of the same mock work queue as
in staging/src/k8s.io/dynamic-resource-allocation/internal/workqueue. Because
there is no time to add it properly to a staging repo, the implementation gets
copied.