Support for DeviceTaintRules depends on a significant amount of
additional code:
- ResourceSlice tracker is a NOP without it.
- Additional informers and corresponding permissions in scheduler and controller.
- Controller code for handling status.
Not all users necessarily need DeviceTaintRules, so adding a second feature
gate for that code makes it possible to limit the blast radius of bugs in that
code without having to turn off device taints and tolerations entirely.
To update the right statuses, the controller must collect more information
about why a pod is being evicted. Updating the DeviceTaintRule statuses then is
handled by the same work queue as evicting pods.
Both operations already share the same client instance and thus QPS+server-side
throttling, so they might as well share the same work queue. Deleting pods is
not necessarily more important than informing users or vice-versa, so there is
no strong argument for having different queues.
While at it, switching the unit tests to usage of the same mock work queue as
in staging/src/k8s.io/dynamic-resource-allocation/internal/workqueue. Because
there is no time to add it properly to a staging repo, the implementation gets
copied.
It hasn't been on-by-default before, therefore it does not get locked to the
new default on yet. This has some impact on the scheduler configuration
because the plugin is now enabled by default.
Because the feature is now GA, it doesn't need to be a label on E2E tests,
which wouldn't be possible anyway once it gets removed entirely.
The pods/finalizer permission can be restricted to just updates because that is
all that matters.
The DeviceTaints rules were under the wrong feature gate check (copy-and-paste)
and must remain disabled when DRA itself becomes enabled.
Thanks to the tracker, the plugin sees all taints directly in the device
definition and can compare it against the tolerations of a request while
trying to find a device for the request.
When the feature is turnedd off, taints are ignored during scheduling.
The controller is derived from the node taint eviction controller.
In contrast to that controller it tracks the UID of pods to prevent
deleting the wrong pod when it got replaced.
SELinuxMount stays off by default, because it changes the default
kubelet behavior. SELinuxChangePolicy is on by default and notifies users
on Pods that could get broken by SELinuxMount feature gate.
* Add Watch to controller roles
Starting from version 1.32, the client feature `WatchListClient` has been
set to `true` in `kube-controller-manager`.
(commit 06a15c5cf9)
As a result, when the `kube-controller-manager` executes the `List` method,
it utilizes `Watch`. However, there are some existing controller roles that
include `List` but do not include `Watch`. Therefore, when processes using
these controller roles execute the `List` method, `Watch` is executed first,
but due to permission errors, it falls back to `List`.
This PR adds `Watch` to the controller roles that include `List` but do not
include `Watch`.
The affected roles are as follows (prefixed with `system:controller:`):
- `cronjob-controller`
- `endpoint-controller`
- `endpointslice-controller`
- `endpointslicemirroring-controller`
- `horizontal-pod-autoscaler`
- `node-controller`
- `pod-garbage-collector`
- `storage-version-migrator-controller`
Signed-off-by: Mitsuru Kariya <mitsuru.kariya@nttdata.com>
* Fix Fixture Data
I apologize, the Fixture Data modifications were missed.
Signed-off-by: Mitsuru Kariya <mitsuru.kariya@nttdata.com>
* Add ControllerRoles Test
Added a test to check that if a controller role includes `List`, it also includes `Watch`.
Signed-off-by: Mitsuru Kariya <mitsuru.kariya@nttdata.com>
* Fix typo
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
* Add Additional Tests
Added tests to check that if NodeRules, ClusterRoles, and NamespaceRoles
include `List`, it also include `Watch`.
Signed-off-by: Mitsuru Kariya <mitsuru.kariya@nttdata.com>
---------
Signed-off-by: Mitsuru Kariya <mitsuru.kariya@nttdata.com>
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
The WatchListClient feature is enabled for kube-controller-manager, but
namespace-controller misses the necessary "watch" permission, which
results in 30 error logs being generated every time a namespace is
deleted and falling back to the standard LIST semantics.
Signed-off-by: Quan Tian <quan.tian@broadcom.com>
The SELinuxWarningController does not necessarily need permissions to read
the objects, because it gets them through a shared informer instantiated by
KCM itself, but let's list the permissions for completeness.
This removes the DRAControlPlaneController feature gate, the fields controlled
by it (claim.spec.controller, claim.status.deallocationRequested,
claim.status.allocation.controller, class.spec.suitableNodes), the
PodSchedulingContext type, and all code related to the feature.
The feature gets removed because there is no path towards beta and GA and DRA
with "structured parameters" should be able to replace it.