If someone gains the ability to create static pods, they might try to use that
ability to run code which gets access to the resources associated with some
existing claim which was previously allocated for some other pod. Such an
attempt already fails because the claim status tracks which pods are allowed to
use the claim, the static pod is not in that list, the node is not authorized
to add it, and the kubelet checks that list before starting the pod in
195803cde5/pkg/kubelet/cm/dra/manager.go (L218-L222).
Even if the pod were started, DRA drivers typically manage node-local resources
which can already be accessed via such an attack without involving DRA. DRA
drivers which manage non-node-local resources have to consider access by a
compromised node as part of their threat model.
Nonetheless, it is better to not accept static pods which reference
ResourceClaims or ResourceClaimTemplates in the first place because there
is no valid use case for it.
This is done at different levels for defense in depth:
- configuration validation in the kubelet
- admission checking of node restrictions
- API validation
Co-authored-by: Jordan Liggitt <liggitt@google.com>
Code changes by Jordan, with one small change (resourceClaims -> resourceclaims).
Unit tests by Patrick.
Added tests to check that if NodeRules, ClusterRoles, and NamespaceRoles
include `List`, it also include `Watch`.
Signed-off-by: Mitsuru Kariya <mitsuru.kariya@nttdata.com>
Starting from version 1.32, the client feature `WatchListClient` has been
set to `true` in `kube-controller-manager`.
(commit 06a15c5cf9)
As a result, when the `kube-controller-manager` executes the `List` method,
it utilizes `Watch`. However, there are some existing controller roles that
include `List` but do not include `Watch`. Therefore, when processes using
these controller roles execute the `List` method, `Watch` is executed first,
but due to permission errors, it falls back to `List`.
This PR adds `Watch` to the controller roles that include `List` but do not
include `Watch`.
The affected roles are as follows (prefixed with `system:controller:`):
- `cronjob-controller`
- `endpoint-controller`
- `endpointslice-controller`
- `endpointslicemirroring-controller`
- `horizontal-pod-autoscaler`
- `node-controller`
- `pod-garbage-collector`
- `storage-version-migrator-controller`
Signed-off-by: Mitsuru Kariya <mitsuru.kariya@nttdata.com>
The WatchListClient feature is enabled for kube-controller-manager, but
namespace-controller misses the necessary "watch" permission, which
results in 30 error logs being generated every time a namespace is
deleted and falling back to the standard LIST semantics.
Signed-off-by: Quan Tian <quan.tian@broadcom.com>
The SELinuxWarningController does not necessarily need permissions to read
the objects, because it gets them through a shared informer instantiated by
KCM itself, but let's list the permissions for completeness.
This removes the DRAControlPlaneController feature gate, the fields controlled
by it (claim.spec.controller, claim.status.deallocationRequested,
claim.status.allocation.controller, class.spec.suitableNodes), the
PodSchedulingContext type, and all code related to the feature.
The feature gets removed because there is no path towards beta and GA and DRA
with "structured parameters" should be able to replace it.
Having a dedicated ActionType which only gets used when the scheduler itself
already detects some change in the list of generated ResourceClaims of a pod
avoids calling the DRA plugin for unrelated Pod changes.
Retitle the feature to the affirmative ("AllowInsecure...=false") instead of a
double-negative ("Disable$NEWTHING...=false") for clarity
Signed-off-by: Micah Hausler <mhausler@amazon.com>
This is a complete revamp of the original API. Some of the key
differences:
- refocused on structured parameters and allocating devices
- support for constraints across devices
- support for allocating "all" or a fixed amount
of similar devices in a single request
- no class for ResourceClaims, instead individual
device requests are associated with a mandatory
DeviceClass
For the sake of simplicity, optional basic types (ints, strings) where the null
value is the default are represented as values in the API types. This makes Go
code simpler because it doesn't have to check for nil (consumers) and values
can be set directly (producers). The effect is that in protobuf, these fields
always get encoded because `opt` only has an effect for pointers.
The roundtrip test data for v1.29.0 and v1.30.0 changes because of the new
"request" field. This is considered acceptable because the entire `claims`
field in the pod spec is still alpha.
The implementation is complete enough to bring up the apiserver.
Adapting other components follows.