Commit graph

6958 commits

Author SHA1 Message Date
Kubernetes Prow Robot
dc348645a9
Merge pull request #133116 from 264nm/fix-approved-unissued-csrs
Fix: Add garbage collection to handle Approved-Unissued CSRs
2025-08-27 16:05:34 -07:00
Kubernetes Prow Robot
1c778ab972
Merge pull request #132503 from LoganGoogle/remove-redundant-code
Remove redundant MilliValue call in GetRawMetric for podautoscaler
2025-08-27 14:53:58 -07:00
Kubernetes Prow Robot
4b818b45e4
Merge pull request #132477 from xigang/daemonset_missscheduled
Fix DaemonSet misscheduled status not updating on node taint changes
2025-08-27 14:53:51 -07:00
Omer Aplatony
b9a8dffa51
Fix replicaCount calculation exceeding max int32 (#126979)
* Fix replicaCount calculation exceeding max int32

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* Add test for scaling up with overflow

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

---------

Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2025-08-27 14:53:15 -07:00
Jan Safranek
75d04e6c7b Add a note about Conflicts return value 2025-08-26 15:04:21 +02:00
Jan Chaloupka
83da6f1a87 fix(controller/podautoscaler): do not print panic when .status.lastScaleTime is not set 2025-08-26 14:18:54 +02:00
Maciej Szulik
a0a43e5f80
Drop CronJobsScheduledAnnotation after the feature GA-ed in 1.32
Signed-off-by: Maciej Szulik <soltysh@gmail.com>
2025-08-25 14:00:35 +02:00
264nm
9c8e03a40b gofmt cleaner.go 2025-08-25 17:36:35 +10:00
264nm
8b760704fc fix(cleaner.go): exit early on check of CSR issue state 2025-08-22 12:37:55 +10:00
Michael Aspinwall
3bdaeea215 feat: Add discovery check to SVM to ensure migration doesn't get stuck 2025-08-20 16:32:15 +00:00
aditya
bb6a0ea6b2 HPA: optimize calculatePodRequests for specific container lookups
- Add early exit when specific container is found in calculatePodRequestsFromContainers
- Add error handling for non-existent containers following existing patterns
- Maintain all existing functionality for pod-level resources and feature gates
- Include comprehensive function documentation

The optimization eliminates unnecessary container iterations when HPA targets
specific containers, providing significant performance improvements for pods
with many containers while preserving full backward compatibility
2025-08-20 19:13:00 +05:30
264nm
ebf3d814f4 Fix(cleaner.go): Add GC to handle Approved-Unissued CSRs 2025-08-20 10:55:07 +10:00
xigang
3eb69eb852 Fix DaemonSet misscheduled status not updating on node taint changes
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-08-12 07:52:40 +08:00
Jan Safranek
97edb4d5e4 Fix SELinux label comparison
The comparison of SELinux labels in KCM tolerates missing fields - the
operating system is going to default them from its defaults, but in KCM we
don't know what the defaults are.

But the OS won't default the last component, "level", which includes also
categories. Make sure that labels with a level set conflicts with level "",
that's what will conflict on the OS too.
2025-08-08 10:13:19 +02:00
Jefftree
7242ddd937 Add jefftree to OWNERS 2025-08-04 19:12:13 +00:00
Sunyanan Choochotkaew
7f052afaef
KEP 5075: implement scheduler
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
2025-07-30 09:52:49 +09:00
Luiz Oliveira
7fbf63a23f
HPA support for pod-level resource specifications (#132430)
* HPA support for pod-level resource specifications

* Add e2e tests for HPA support for pod-level resource specifications
2025-07-29 09:02:26 -07:00
Eddie
727a6e6db5
Reject pod when attachment limit is exceeded (#132933)
* Reject pod when attachment limit is exceeded

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Record admission rejection

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Fix pull-kubernetes-linter-hints

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Fix AD Controller unit test failure

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Consolidate error handling logic in WaitForAttachAndMount

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Improve error context

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Update admissionRejectionReasons to include VolumeAttachmentLimitExceededReason

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Update status message

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Add TestWaitForAttachAndMountVolumeAttachLimitExceededError unit test

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Add e2e test

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Fix pull-kubernetes-linter-hints

Signed-off-by: Eddie Torres <torredil@amazon.com>

---------

Signed-off-by: Eddie Torres <torredil@amazon.com>
2025-07-24 17:58:54 -07:00
Kubernetes Prow Robot
a5d8ab60ef
Merge pull request #132632 from sdowell/gc-rv-race
fix: add RV check on GC delete calls
2025-07-24 17:58:47 -07:00
Kubernetes Prow Robot
7912e5fd67
Merge pull request #131549 from carlory/KEP-3751-GA
[Kep-3751] Promote VolumeAttributesClass to GA
2025-07-24 16:44:27 -07:00
carlory
94bf8fc8a9 Promoted API VolumeAttributesClass and VolumeAttributesClassList to storage.k8s.io/v1.
Promoted feature-gate `VolumeAttributesClass` to GA (on by default)

Signed-off-by: carlory <baofa.fan@daocloud.io>
2025-07-25 01:53:59 +08:00
Patrick Ohly
5c4f81743c DRA: use v1 API
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.

However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
2025-07-24 08:33:45 +02:00
Kubernetes Prow Robot
6ad14ad876
Merge pull request #132991 from danwinship/endpoints-e2e-updates
Endpoints e2e updates for KEP-4974
2025-07-23 19:56:26 -07:00
Filip Křepinský
2cb48f77f0 schedule pod availability checks at the correct time in ReplicaSets 2025-07-23 18:58:57 +02:00
Taahir Ahmed
4624cb9bb9 Pod Certificates: Basic implementation
* Define feature gate
* Define and serve PodCertificateRequest
* Implement Kubelet projected volume source
* kube-controller-manager GCs PodCertificateRequests
* Add agnhost subcommand that implements a toy signer for testing

Change-Id: Id7ed030d449806410a4fa28aab0f2ce4e01d3b10
2025-07-21 21:49:57 +00:00
Rita Zhang
d42a1d58d0
DRAAdminAccess: add metrics
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-07-18 07:15:41 -07:00
Dan Winship
24065780ed Add e2eendpointslice.WaitForEndpointPorts, use in some tests.
Slightly-more-generic replacement for validateEndpointsPortsOrFail()
(but only validates EndpointSlices, not Endpoints).

Also, add two new unit tests to the Endpoints controller, to assert
the correct Endpoints-generating behavior in the cases formerly
covered by the "should serve endpoints on same port and different
protocols" and "should be updated after adding or deleting ports" e2e
tests (since they are now EndpointSlice-only). (There's not much point
in testing the Endpoints controller in "end to end" tests, since
nothing in a normal cluster ever looks at its output, so there's
really only one "end" anyway.)
2025-07-17 15:34:34 -04:00
Kubernetes Prow Robot
fe13474f61
Merge pull request #106225 from shawnhanx/certificates_cleaner
cleaner.go should use time.Until instead of t.Sub(time.Now())
2025-07-14 23:44:24 -07:00
Kubernetes Prow Robot
566d6acb70
Merge pull request #131759 from carlory/clean-volumehost
Remove unused GetHostIP method
2025-07-12 05:35:28 -07:00
Kubernetes Prow Robot
bedb915a4e
Merge pull request #132781 from PatrickLaabs/132086-pkg-controller-1
chore: depr. pointer pkg replacement for pkg/controller (1/2)
2025-07-07 12:32:24 -07:00
Kubernetes Prow Robot
66cf6286a8
Merge pull request #130909 from Edwinhr716/minreadyseconds-fix
Fix StatefulSetMinReadySeconds healthy concept
2025-07-07 12:31:26 -07:00
PatrickLaabs
baf71997f5 chore: depr. pointer pkg replacement for pkg/controller 2025-07-07 13:22:36 +02:00
PatrickLaabs
8abcdf0885 chore: depr. pointer pkg replacement for pkg/controller 2025-07-07 13:13:39 +02:00
Tsubasa Nagasawa
0ad351281b Cleanup duplicate function to get port number from named port
Currently, the function to translate named port to port number is
located in two places (pod utils and endpointslice lib).
When fixing the bug in restartable init containers, one part of the code
was fixed, but the other part was not, leaving the bug unresolved.
To prevent such partial fixes in the future, we will make the function
in the endpointslice lib public and remove the other part of the code
from pod utils. Then consume the endpointslice lib in k/k.

Signed-off-by: Tsubasa Nagasawa <toversus2357@gmail.com>
2025-07-05 10:03:30 +09:00
Kubernetes Prow Robot
0617903e9d
Merge pull request #131344 from pohly/dra-taint-unit-test-flake-minimal
DRA: work around fake.ClientSet informer deficiency in unit test
2025-07-03 02:51:25 -07:00
Sam Dowell
1c1f00a5f4 fix: add RV check on GC delete calls
It was possible that the object was changed between the live Get and
Delete calls while processing an attempt to delete, causing incorrect
deletion of objects by the garbage collector. A defensive
resourceVersion precondition is added to the delete call to ensure that
the object was properly classified for deletion.
2025-07-02 11:01:56 -07:00
Kubernetes Prow Robot
4186edc4d1
Merge pull request #132615 from mimowo/commonize-pod-indexing
Commonize filtering of Pods by Owner with all orphans in namespace
2025-07-02 02:03:32 -07:00
Kubernetes Prow Robot
a735818b7a
Merge pull request #132533 from nojnhuh/dra-orphan-claim
DRA: fix deleting orphaned ResourceClaim on startup
2025-07-02 02:03:25 -07:00
Heba Elayoty
977c670733
Add unit tests for minReady new behaviour
Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
2025-07-01 18:05:26 -07:00
Michal Wozniak
6d5e0bf2a2 review remarks 2025-07-01 16:59:19 +02:00
Michal Wozniak
ac86e67b7d Commonize filtering of Pods by Owner with all orphans in namespace 2025-06-30 08:07:21 +02:00
Huy Pham
b2f27c0649
fix: Truncate too long Deployment name in RS name (#132560)
* fix: Truncate too long Deployment name in RS name

* fix: lint & adjust unit tests

* fix: use const for "-" & unit tests

* Add test case for very long hash

* Explicitly define expected deployment name portion
2025-06-27 16:32:29 -07:00
Jon Huhn
f1845218e2 fixup! DRA: fix deleting orphaned ResourceClaim on startup 2025-06-26 23:21:18 -05:00
Kubernetes Prow Robot
efd2a0d1f5
Merge pull request #132351 from googs1025/fix/hpa_memory
bugfix(hpa): introduce buildQuantity helper for consistent resource quantity
2025-06-26 11:02:35 -07:00
Jon Huhn
ef117edf35 DRA: fix deleting orphaned ResourceClaim on startup 2025-06-25 11:11:43 -05:00
googs1025
b50d508176 bugfix(hpa): introduce buildQuantity helper for consistent resource quantity creation
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-06-25 08:23:53 +08:00
Kubernetes Prow Robot
5b1af0c8c2
Merge pull request #127655 from guozheng-shen/remove-usage
remove 'endpointsleases' and 'configmapsleases' from usage
2025-06-24 09:54:28 -07:00
Logan Zhai
a352bf8815 Remove redundant MilliValue call in GetRawMetric for podautoscaler,
which has no functional impact.
2025-06-24 14:06:21 +00:00
Kubernetes Prow Robot
49c20d6f44
Merge pull request #132173 from dejanzele/feat/promote-job-pod-replacement-policy-ga
KEP-3939: Job Pod Replacement Policy; promote to GA
2025-06-24 07:04:28 -07:00
xigang
66c611125c Add namespace-aware orphan pod indexing
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-06-19 16:32:20 +08:00
Kubernetes Prow Robot
f407bd6d24
Merge pull request #132254 from carlory/cleanup-MountContainers
Cleanup after Alpha feature MountContainers was removed
2025-06-18 17:24:50 -07:00
Kubernetes Prow Robot
8f1f17a04f
Merge pull request #132305 from xigang/job_index
Job controller optimization: reduce work duration time & minimize cache locking
2025-06-18 05:27:01 -07:00
xigang
91b4816c23 Optimize job controller performance: reduce work duration time & minimize cache locking
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-06-18 15:28:12 +08:00
Kubernetes Prow Robot
17e20ec9d4
Merge pull request #131281 from googs1025/add_miss_shutdown
chore: add miss Shutdown call for selinux_warning controller
2025-06-17 06:18:59 -07:00
Kubernetes Prow Robot
3e39d1074f
Merge pull request #132221 from dims/new-cmp-diff-impl
New implementation for `Diff` (drop in replacement for `cmp.Diff`)
2025-06-16 18:02:58 -07:00
Davanum Srinivas
03afe6471b
Add a replacement for cmp.Diff using json+go-difflib
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2025-06-16 17:10:42 -04:00
Dejan Zele Pejchev
bccc9fe470
KEP-3939: Job Pod Replacement Policy; promote to GA
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
2025-06-16 16:26:03 +02:00
Filip Křepinský
bdfa8839be calculateStatus should use the same now time point for each pod
make IsPodAvailable time check inclusive
2025-06-14 18:39:15 +02:00
carlory
85bc3cb096 Remove GetExec method from VolumeHost
Signed-off-by: carlory <baofa.fan@daocloud.io>
2025-06-13 10:58:37 +08:00
aumpatel
db2555628c Fix: HPA suppresses FailedRescale event on successful conflict retry
This change modifies the HPA controller to use retry.RetryOnConflict when updating a scale subresource. This prevents the controller from emitting a FailedRescale event on transient API conflicts if a subsequent retry succeeds. If the retry is successful, a SuccessfulRescale event is emitted. If all retries are exhausted and the conflict persists, the original FailedRescale event is emitted. This reduces event noise caused by race conditions where the scale subresource is updated by another process.
2025-06-12 07:35:07 -04:00
carlory
f0dde38234 Remove pluginName param from GetMounter and GetExec
Signed-off-by: carlory <baofa.fan@daocloud.io>
2025-06-12 17:29:17 +08:00
Kubernetes Prow Robot
089849ac22
Merge pull request #131822 from atiratree/replicationcontroller-terminating-replicas
disable terminatingReplicas reconciliation in ReplicationController
2025-06-10 15:17:01 -07:00
Kubernetes Prow Robot
a26f3fd5c6
Merge pull request #132109 from linxiulei/jobdelay
Clean backoff record earlier
2025-06-06 13:38:38 -07:00
Eric Lin
1f46b3fdbf Clean backoff record earlier
Once received job deletion event, it cleans the backoff records for that
job before enqueueing this job so that we can avoid a race condition
that the syncJob() may incorrect use stale backoff records for a newly created
job with same key.

Co-authored-by: Michal Wozniak <michalwozniak@google.com>
2025-06-06 18:31:38 +00:00
Kubernetes Prow Robot
a883be6e36
Merge pull request #132031 from atiratree/update-getRSPods
add orphanedPods parameter to getRSPods and improve code flow in syncReplicaSet
2025-06-03 12:10:39 -07:00
Kubernetes Prow Robot
62f72addf2
Merge pull request #120816 from tnqn/fix-unreachable-taint-delay
NoExecute taint should be added when a Node's ready condition becomes Unknown
2025-06-03 00:54:44 -07:00
Filip Křepinský
b7d16fea7f disable terminatingReplicas reconciliation in ReplicationController 2025-05-30 21:08:12 +02:00
Filip Křepinský
aac00c1f0e add orphanedPods parameter to getRSPods
and improve code flow in syncReplicaSet
2025-05-29 10:50:32 +02:00
Antonio Ojea
b9fec8bf4f fix scheme import
Change-Id: I9a94c06b931031a1c2391184342fd5ffa79e3128
2025-05-15 13:46:48 +00:00
Kubernetes Prow Robot
b587977f7c
Merge pull request #131445 from natasha41575/renameObservedGenHelperFns
update godoc for and rename observedGeneration helpers
2025-05-14 11:39:19 -07:00
carlory
fe1b1fff7c Remove unused GetHostIP method 2025-05-14 14:50:59 +08:00
Kubernetes Prow Robot
1325262b5f
Merge pull request #130961 from hakuna-matatah/rs
Optimize RS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking
2025-05-13 08:43:15 -07:00
Kubernetes Prow Robot
b8d9c12d1b
Merge pull request #131330 from aojea/servicecidr_fixes
servicecidr: only patch status if necessary
2025-05-12 17:53:16 -07:00
Harish Kuna
e42aba6c0c Optimize RS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking 2025-05-12 19:56:46 +00:00
Quan Tian
f718096b74 NoExecute taint should be added when a Node's ready condition becomes Unknown
After a Node has stopped posting heartbeats for nodeMonitorGracePeriod,
it will be considered unreachable, its ready condition will be set to
Unknown, NoSchedule taint will be added, all Pods on it will be set to
NotReady, but there is always a delay of 5s before NoExecute taint is
added to the Node, adding 5s to the recovery time of Pods which are
supposed to be evicted by the taint and recreated on other Nodes sooner.

The delay is because processTaintBaseEviction() uses the last observed
ready condition of the Node instead of the current one to determine
whether it should add the Node to the taint queue. When a Node is set to
unreachable due to missing heartbeats, the last observed ready condition
is still true and the current ready condition is unknown, we should use
the latter for processTaintBaseEviction().

Signed-off-by: Quan Tian <qtian@vmware.com>
2025-05-10 17:22:11 +08:00
Kubernetes Prow Robot
fa10ea63a6
Merge pull request #127050 from omerap12/podautoscaler-ExternalPerpodMetricReplicas-intmax
HPA: Fix int overflow in GetExternalPerPodMetricReplicas
2025-05-09 13:37:14 -07:00
Omer Aplatony
af1d60f30b Add hpa reviewers
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2025-05-07 18:16:15 +00:00
Omer Aplatony
0acc7bd4dc HPA: Fix int overflow in GetExternalPerPodMetricReplicas
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2025-05-07 16:26:27 +00:00
Kubernetes Prow Robot
d2507bb01a
Merge pull request #130806 from hakuna-matatah/master
Optimize Statefulset Controller Performance: Reduce Work Duration Time & Minimize Cache Locking.
2025-05-06 06:03:13 -07:00
Kubernetes Prow Robot
0b8133816b
Merge pull request #131477 from pohly/golangci-lint@v2
golangci-lint v2
2025-05-02 23:03:55 -07:00
Jordan Liggitt
6bb6c99342
Drop null creationTimestamp from test fixtures 2025-05-02 15:38:40 -04:00
Matthieu MOREL
4adb58565c chore: bump golangci-lint to v2
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-05-02 12:51:02 +02:00
Antonio Ojea
56e533f4a0 servicecidr: only patch status if necessary
Change-Id: I1fadec3e48bd3cb734658b8bfca58bb80ab911b9
2025-05-02 08:26:17 +00:00
Kubernetes Prow Robot
fe5afa919b
Merge pull request #130333 from kmala/job
handle job complete update delayed event
2025-04-25 17:55:22 -07:00
Natasha Sarkar
92359cdc69 update godoc for and rename observedGeneration helpers 2025-04-24 16:05:01 +00:00
Kubernetes Prow Robot
c59203e051
Merge pull request #121967 from torredil/update-logging
Update log verbosity for node health and taint checks
2025-04-24 06:22:34 -07:00
Patrick Ohly
ff108e72a5 DRA device taints: fix rare unit test flake
TestCancelEviction flaked with a 0,01% rate because assumed that an event had
already been created once the pod was updated, but that was only true under
some timing conditions.
2025-04-17 17:16:23 +02:00
Patrick Ohly
ff2e6dddc8 DRA device taints: work around fake.ClientSet informer race
fake.Clientset suffers from a race condition related to informers:
it does not implement resource version support in its Watch
implementation and instead assumes that watches are set up
before further changes are made.

If a test waits for caches to be synced and then immediately
adds an object, that new object will never be seen by event handlers
if the race goes wrong and the Watch call hadn't completed yet
(can be triggered by adding a sleep before b53b9fb557/staging/src/k8s.io/client-go/tools/cache/reflector.go (L431)).

To work around this, we count all watches and only proceed when
all of them are in place. This replaces the normal watch reactor
(b53b9fb557/staging/src/k8s.io/client-go/kubernetes/fake/clientset_generated.go (L161-L173)).
2025-04-17 10:57:27 +02:00
Patrick Ohly
638abf0339 DRA device taints: more logging in test 2025-04-17 10:55:13 +02:00
Patrick Ohly
40f2085d68 DRA device taint: clean up test initialization
The creation of the shared informer factory and starting it can be done all in
the same function, which makes it a bit more obvious what happens in which
order and avoids some code duplication.
2025-04-17 10:55:13 +02:00
googs1025
e8dbfc0b6f add miss Shutdown call for selinux_warning controller 2025-04-14 09:07:51 +08:00
Keerthan Reddy Mala
d4fd41285b update the log message to reflect success and failed jobs 2025-04-08 10:21:02 -07:00
Keerthan Reddy Mala
551f3c7824 merge the integration tests into a single one 2025-04-07 17:37:19 -07:00
Keerthan Reddy Mala
c7d0ed5c48 add integration test for job failure event delay and remove the unit test 2025-04-01 12:38:15 -07:00
Filip Křepinský
8db1426554 rename DeploymentPodReplacementPolicy FG to DeploymentReplicaSetTerminatingReplicas 2025-03-27 20:27:44 +01:00
Jean-Marc François
2dd9eda47f Add configurable tolerance logic. 2025-03-21 18:48:37 -04:00
Harish Kuna
c005b85d4d Reduce locking duration on cache to fetch data from Cache 2025-03-21 15:23:08 +00:00
Edwinhr716
8db5f06183 adding commits of the original PR
isHealthy -> isUnavailable, fixed comments

fixed reversed logic

changed logs from unhealthy to unavailable
2025-03-20 22:46:38 +00:00
Keerthan Reddy Mala
1b8bbcac44 Add integration test 2025-03-20 15:04:44 -07:00
Kubernetes Prow Robot
b0d6079ddc
Merge pull request #130947 from pohly/dra-device-taints-flake
DRA device taints: fix some race conditions
2025-03-20 14:16:55 -07:00