- Add early exit when specific container is found in calculatePodRequestsFromContainers
- Add error handling for non-existent containers following existing patterns
- Maintain all existing functionality for pod-level resources and feature gates
- Include comprehensive function documentation
The optimization eliminates unnecessary container iterations when HPA targets
specific containers, providing significant performance improvements for pods
with many containers while preserving full backward compatibility
The comparison of SELinux labels in KCM tolerates missing fields - the
operating system is going to default them from its defaults, but in KCM we
don't know what the defaults are.
But the OS won't default the last component, "level", which includes also
categories. Make sure that labels with a level set conflicts with level "",
that's what will conflict on the OS too.
* Reject pod when attachment limit is exceeded
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Record admission rejection
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Fix pull-kubernetes-linter-hints
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Fix AD Controller unit test failure
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Consolidate error handling logic in WaitForAttachAndMount
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Improve error context
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Update admissionRejectionReasons to include VolumeAttachmentLimitExceededReason
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Update status message
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Add TestWaitForAttachAndMountVolumeAttachLimitExceededError unit test
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Add e2e test
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Fix pull-kubernetes-linter-hints
Signed-off-by: Eddie Torres <torredil@amazon.com>
---------
Signed-off-by: Eddie Torres <torredil@amazon.com>
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.
However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
Slightly-more-generic replacement for validateEndpointsPortsOrFail()
(but only validates EndpointSlices, not Endpoints).
Also, add two new unit tests to the Endpoints controller, to assert
the correct Endpoints-generating behavior in the cases formerly
covered by the "should serve endpoints on same port and different
protocols" and "should be updated after adding or deleting ports" e2e
tests (since they are now EndpointSlice-only). (There's not much point
in testing the Endpoints controller in "end to end" tests, since
nothing in a normal cluster ever looks at its output, so there's
really only one "end" anyway.)
Currently, the function to translate named port to port number is
located in two places (pod utils and endpointslice lib).
When fixing the bug in restartable init containers, one part of the code
was fixed, but the other part was not, leaving the bug unresolved.
To prevent such partial fixes in the future, we will make the function
in the endpointslice lib public and remove the other part of the code
from pod utils. Then consume the endpointslice lib in k/k.
Signed-off-by: Tsubasa Nagasawa <toversus2357@gmail.com>
It was possible that the object was changed between the live Get and
Delete calls while processing an attempt to delete, causing incorrect
deletion of objects by the garbage collector. A defensive
resourceVersion precondition is added to the delete call to ensure that
the object was properly classified for deletion.
* fix: Truncate too long Deployment name in RS name
* fix: lint & adjust unit tests
* fix: use const for "-" & unit tests
* Add test case for very long hash
* Explicitly define expected deployment name portion
This change modifies the HPA controller to use retry.RetryOnConflict when updating a scale subresource. This prevents the controller from emitting a FailedRescale event on transient API conflicts if a subsequent retry succeeds. If the retry is successful, a SuccessfulRescale event is emitted. If all retries are exhausted and the conflict persists, the original FailedRescale event is emitted. This reduces event noise caused by race conditions where the scale subresource is updated by another process.
Once received job deletion event, it cleans the backoff records for that
job before enqueueing this job so that we can avoid a race condition
that the syncJob() may incorrect use stale backoff records for a newly created
job with same key.
Co-authored-by: Michal Wozniak <michalwozniak@google.com>
After a Node has stopped posting heartbeats for nodeMonitorGracePeriod,
it will be considered unreachable, its ready condition will be set to
Unknown, NoSchedule taint will be added, all Pods on it will be set to
NotReady, but there is always a delay of 5s before NoExecute taint is
added to the Node, adding 5s to the recovery time of Pods which are
supposed to be evicted by the taint and recreated on other Nodes sooner.
The delay is because processTaintBaseEviction() uses the last observed
ready condition of the Node instead of the current one to determine
whether it should add the Node to the taint queue. When a Node is set to
unreachable due to missing heartbeats, the last observed ready condition
is still true and the current ready condition is unknown, we should use
the latter for processTaintBaseEviction().
Signed-off-by: Quan Tian <qtian@vmware.com>
TestCancelEviction flaked with a 0,01% rate because assumed that an event had
already been created once the pod was updated, but that was only true under
some timing conditions.
fake.Clientset suffers from a race condition related to informers:
it does not implement resource version support in its Watch
implementation and instead assumes that watches are set up
before further changes are made.
If a test waits for caches to be synced and then immediately
adds an object, that new object will never be seen by event handlers
if the race goes wrong and the Watch call hadn't completed yet
(can be triggered by adding a sleep before b53b9fb557/staging/src/k8s.io/client-go/tools/cache/reflector.go (L431)).
To work around this, we count all watches and only proceed when
all of them are in place. This replaces the normal watch reactor
(b53b9fb557/staging/src/k8s.io/client-go/kubernetes/fake/clientset_generated.go (L161-L173)).
The creation of the shared informer factory and starting it can be done all in
the same function, which makes it a bit more obvious what happens in which
order and avoids some code duplication.