Commit graph

7189 commits

Author SHA1 Message Date
Kubernetes Prow Robot
4e2bbc78bf
Merge pull request #137170 from pohly/dra-device-taints-beta
DRA device taints: graduate to beta
2026-03-13 00:13:38 +05:30
Patrick Ohly
566dc7f3f3 DRA device taints: graduate to beta
The fields become beta, enabled by default. DeviceTaintRule gets
added to the v1beta2 API, but support for it must remain off by default
because that API group is also off by default.

The v1beta1 API is left unchanged. No-one should be using it
anymore (deprecated in 1.33, could be removed now if it wasn't for
reading old objects and version emulation).

To achieve consistent validation, declarative validation must be enabled also
for v1alpha3 (was already enabled for other versions). Otherwise,
TestVersionedValidationByFuzzing fails:

    --- FAIL: TestVersionedValidationByFuzzing (0.09s)
        --- FAIL: TestVersionedValidationByFuzzing/resource.k8s.io/v1beta2,_Kind=DeviceTaintRule (0.00s)
            validation_test.go:109: different error count (0 vs. 1)
                resource.k8s.io/v1alpha3: <no errors>
                resource.k8s.io/v1beta2: "spec.taint.effect: Unsupported value: \"幤HxÒQP¹¬永唂ȳ垞ş]嘨鶊\": supported values: \"NoExecute\", \"NoSchedule\", \"None\""
            ...
2026-03-12 18:26:02 +01:00
Kubernetes Prow Robot
6d92449054
Merge pull request #134290 from huww98/kcm-no-get-pv
Do not get PV for externally deleting volume
2026-03-12 05:13:35 +05:30
Kubernetes Prow Robot
e3c05bfa4e
Merge pull request #136700 from Jefftree/cra-fix
simplify cluster role aggregation and remove update path
2026-03-12 00:45:35 +05:30
Kubernetes Prow Robot
38940f0222
Merge pull request #135297 from michaelasp/svmUpdateCRD
Remove CRD stored versions from status upon SVM migration
2026-03-11 08:03:09 +05:30
Michael Aspinwall
d274e05cc9 Remove CRD stored versions from status upon SVM migration 2026-03-11 00:50:27 +00:00
Kubernetes Prow Robot
aa5abdd371
Merge pull request #136817 from kairosci/fix-gc-notfound-136525
Handle NotFound errors in garbage collector
2026-03-11 03:53:09 +05:30
Alessio Attilio
8ed40e7ae7 test: add unit tests for deleteObject NotFound handling in garbage collector
When deleteObject returns a NotFound error (the object was externally deleted
between the GET and the DELETE), attemptToDeleteItem should enqueue a virtual
delete event and return enqueuedVirtualDeleteEventErr.

Cover both code paths:
- default (background propagation): item with dangling owner
- waitingForDependentsDeletion: item whose owner is foreground-deleting
2026-03-10 20:43:30 +01:00
Kubernetes Prow Robot
21b427c299
Merge pull request #136827 from atombrella/feature/fix_nilness_controller
Fix cases of nilness under pkg/controller.
2026-03-10 15:15:11 +05:30
Kubernetes Prow Robot
3d6026d2fd
Merge pull request #136178 from omerap12/promote-hpa-metrics
promote HPA metrics to beta
2026-03-10 01:19:13 +05:30
Kubernetes Prow Robot
2bbb175707
Merge pull request #137461 from ahmedharabi/fix/statefulset-error-wrapping
statefulset: wrap errors with %w in StatefulPodControl
2026-03-07 00:08:25 +05:30
Jordan Liggitt
45900a1deb
Fix vet error 2026-03-05 18:11:02 -05:00
ahmedharabi
a0dee17c1d statefulset: wrap errors with %w in StatefulPodControl
Signed-off-by: ahmedharabi <harabiahmed88@gmail.com>
2026-03-05 23:02:16 +01:00
Kubernetes Prow Robot
c6f70e3a38
Merge pull request #136399 from tico88612/feat/storage-metric-beta
Rename metric `volume_operation_total_errors` to `volume_operation_errors_total`
2026-03-06 00:46:18 +05:30
Omer Aplatony
3799fc9942
Add unit tests for HPA metrics (#136670)
* Add unit tests for HPA metrics

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* removed mock monitor

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* fmt

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* spelling

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* lint

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* lint

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

---------

Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2026-03-05 19:10:26 +05:30
Kubernetes Prow Robot
8bd1505fc0
Merge pull request #137108 from pohly/logtools-update
golangci-lint: bump to logtools v0.10.1
2026-03-05 10:14:16 +05:30
Kubernetes Prow Robot
8275484dcf
Merge pull request #137297 from atombrella/feature/pkg_forvar_modernize
Remove redundant variable re-assignment in for-loops under pkg
2026-03-05 00:28:20 +05:30
xigang
9d10b1f799 refactor: remove unused desiredStateOfWorld parameter from DetermineVolumeAction
Signed-off-by: xigang <wangxigang2014@gmail.com>
2026-03-04 22:01:43 +08:00
Kubernetes Prow Robot
9d7dda7186
Merge pull request #137245 from atombrella/feature/slices_contains_pkg_controller
Update `pkg/controller` to use slices.Contains
2026-03-04 18:04:20 +05:30
Patrick Ohly
b895ce734f golangci-lint: bump to logtools v0.10.1
This fixes a bug that caused log calls involving `klog.Logger` to not be
checked.

As a result we have to fix some code that is now considered faulty:

    ERROR: pkg/controller/serviceaccount/tokens_controller.go:382:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (e *TokensController) generateTokenIfNeeded(ctx context.Context, logger klog.Logger, serviceAccount *v1.ServiceAccount, cachedSecret *v1.Secret) ( /* retry */ bool, error) {
    ERROR: ^
    ERROR: pkg/controller/storageversionmigrator/storageversionmigrator.go:299:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (svmc *SVMController) runMigration(ctx context.Context, logger klog.Logger, gvr schema.GroupVersionResource, resourceMonitor *garbagecollector.Monitor, toBeProcessedSVM *svmv1beta1.StorageVersionMigration, listResourceVersion string) (err error, failed bool) {
    ERROR: ^
    ERROR: pkg/proxy/node.go:121:3: logging function "Error" should not use format specifier "%q" (logcheck)
    ERROR: 		klog.FromContext(ctx).Error(nil, "Timed out waiting for node %q to exist", nodeName)
    ERROR: 		^
    ERROR: pkg/proxy/node.go:123:3: logging function "Error" should not use format specifier "%q" (logcheck)
    ERROR: 		klog.FromContext(ctx).Error(nil, "Timed out waiting for node %q to be assigned IPs", nodeName)
    ERROR: 		^
    ERROR: pkg/scheduler/backend/queue/scheduling_queue.go:610:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (p *PriorityQueue) runPreEnqueuePlugin(ctx context.Context, logger klog.Logger, pl fwk.PreEnqueuePlugin, pInfo *framework.QueuedPodInfo, shouldRecordMetric bool) *fwk.Status {
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:286:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) deleteClaim(ctx context.Context, claim *resourceapi.ResourceClaim, logger klog.Logger) error {
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:499:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) waitForExtendedClaimInAssumeCache(
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:528:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) createExtendedResourceClaimInAPI(
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:592:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) unreserveExtendedResourceClaim(ctx context.Context, logger klog.Logger, pod *v1.Pod, state *stateData) {
    ERROR: ^
    ERROR: pkg/scheduler/framework/runtime/batch.go:171:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (b *OpportunisticBatch) batchStateCompatible(ctx context.Context, logger klog.Logger, pod *v1.Pod, signature fwk.PodSignature, cycleCount int64, state fwk.CycleState, nodeInfos fwk.NodeInfoLister) bool {
    ERROR: ^
    ERROR: staging/src/k8s.io/component-base/featuregate/feature_gate.go:890:4: Additional arguments to Info should always be Key Value pairs. Please check if there is any key or value missing. (logcheck)
    ERROR: 			logger.Info("Warning: SetEmulationVersionAndMinCompatibilityVersion will change already queried feature", "featureGate", feature, "oldValue", oldVal, newVal)
    ERROR: 			^
    ERROR: test/images/sample-device-plugin/sampledeviceplugin.go:108:2: logging function "Info" should not use format specifier "%s" (logcheck)
    ERROR: 	logger.Info("pluginSocksDir: %s", pluginSocksDir)
    ERROR: 	^
    ERROR: test/images/sample-device-plugin/sampledeviceplugin.go:123:2: logging function "Info" should not use format specifier "%s" (logcheck)
    ERROR: 	logger.Info("CDI_ENABLED: %s", cdiEnabled)
    ERROR: 	^

While waiting for this to merge, another call was added which also doesn't
follow conventions:

    ERROR: pkg/kubelet/kubelet.go:2454:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (kl *Kubelet) deletePod(ctx context.Context, logger klog.Logger, pod *v1.Pod) error {
    ERROR: ^

Contextual logging has been beta and enabled by default for several releases
now. It's mostly just a matter of wrapping up and declaring it GA. Therefore
the calls which directly call WithName or WithValues (always have an effect)
are left as-is instead of converting them to use the klog wrappers (support
disabling the effect). To allow that, the linter gets reconfigured to not
complain about this anymore, anywhere.

The calls which would have to be fixed otherwise are:

    ERROR: pkg/kubelet/cm/dra/claiminfo.go:170:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-claiminfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/healthinfo.go:45:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-healthinfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/healthinfo.go:89:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-healthinfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/healthinfo.go:157:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-healthinfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/manager.go:175:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:239:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:593:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:781:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(context.Background()).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:898:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager_test.go:1638:15: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 				logger := klog.FromContext(streamCtx).WithName(st.Name())
    ERROR: 				          ^
    ERROR: pkg/kubelet/cm/dra/plugin/dra_plugin.go:77:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-plugin")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/plugin/dra_plugin.go:108:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-plugin")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/plugin/dra_plugin.go:161:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-plugin")
    ERROR: 	          ^
    ERROR: staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker.go:695:14: function "WithValues" should be called through klogr.LoggerWithValues (logcheck)
    ERROR: 			logger := logger.WithValues("device", deviceID)
    ERROR: 			          ^
    ERROR: test/integration/apiserver/watchcache_test.go:42:54: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	etcd0URL, stopEtcd0, err := framework.RunCustomEtcd(klog.FromContext(ctx).WithName("etcd0"), "etcd_watchcache0", etcdArgs)
    ERROR: 	                                                    ^
    ERROR: test/integration/apiserver/watchcache_test.go:47:54: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	etcd1URL, stopEtcd1, err := framework.RunCustomEtcd(klog.FromContext(ctx).WithName("etcd1"), "etcd_watchcache1", etcdArgs)
    ERROR: 	                                                    ^
    ERROR: test/integration/scheduler_perf/scheduler_perf.go:1149:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 		logger = logger.WithName(tCtx.Name())
    ERROR: 		         ^
2026-03-04 12:08:18 +01:00
Kubernetes Prow Robot
5941fed3d6
Merge pull request #136912 from dfajmon/selinux-ga
Promote SELinuxChangePolicy & SELinuxMountReadWriteOncePod to GA
2026-03-03 22:07:29 +05:30
Kubernetes Prow Robot
11c10dc5a0
Merge pull request #136939 from pohly/dra-device-taints-unit-test-improvements
DRA device taints: update unit tests
2026-03-03 02:48:54 +05:30
Mads Jensen
f11bb48738 Remove redundant re-assignment in for-loops under pkg
This the forvar rule from modernize. The semantics of the for-loop
changed from Go 1.22 to make this pattern obsolete.
2026-03-02 08:47:43 +01:00
ChengHao Yang
5c88906dca
Rename volume_operation_total_errors to volume_operation_errors_total
Raname this because facing lint error, counter metrics should have
"_total" suffix. Add the test `volume_operation_errors_total`
Marked `volume_operation_total_errors` as deprecated

Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
2026-02-28 20:08:07 +08:00
Kubernetes Prow Robot
330950ca52
Merge pull request #137254 from michaelasp/statefulConsistency
Add the ability for the statefulset controller to read its own writes
2026-02-28 01:39:30 +05:30
Michael Aspinwall
c8e8bd5085 Add the ability for the statefulset controller to read its own writes 2026-02-27 18:21:30 +00:00
Daniel Fajmon
b0919d81a0 Promote SELinuxChangePolicy & SELinuxMountReadWriteOncePod to GA 2026-02-27 14:58:14 +01:00
Patrick Ohly
29e92367db DRA device taints: avoid unnecessary Pod lookup
When rapidly processing informer events it can happen that a pod gets scheduled
twice (seen only in the TestEviction/update unit test):

- Claim update observed, pod from informer cache with NodeName from update -> queue pod for eviction.
- Pod update observed, claim from informer cache -> queue pod again.

The effect is one additional Get call to the apiserver. We can avoid it by
maintaining an LRU cache with the UIDs of the pods which we have evicted and
thus don't need to do anything for.
2026-02-27 14:38:30 +01:00
Patrick Ohly
017a53a1a9 DRA device taints: simplify more tests with synctest
In these cases it's certain that no time needs to pass, so Wait can
replace polling with Eventually. This also means that locking is
not necessary to prevent data races.
2026-02-27 07:47:28 +01:00
Patrick Ohly
4521c34276 DRA device taints: remove usage of testify for unit test
In particular with the builtin tCtx.Assert/Expect the assertions are also short
when using gomega and often more readable (no more confusion in Equal which one
is the expected and which the actual value).
2026-02-27 07:47:28 +01:00
Patrick Ohly
fb94a99d2f DRA device taints: artificially delay pod deletion during test
We can observe the delay in the metric histogram. Because we run in a synctest
bubble, the delay is 100% predictable.

Unfortunately we cannot use the reactor mechanism of the fake client: that
delays while holding the fake's mutex. When some other goroutine (in this case,
the event recorder) calls the client, it gets blocked without being considered
durably blocked by synctest, so time does not advance and the test gets stuck.
2026-02-27 07:47:28 +01:00
Patrick Ohly
7d7b4c3dcb DRA device taint tests: remove List+Watch workaround
This was fixed in client-go itself, no workaround needed anymore.
2026-02-27 07:46:33 +01:00
Patrick Ohly
75626bcf3f DRA device taints: update unit tests
Thanks for waiting for cache sync via channels the random delays caused by
polling are gone, making the initial setup including cache sync happen
"immediately" when a test starts (= same virtual time). This makes the tests
more predictable and simplifies making further assertions about when something
happens or how long it takes.

While at it, restore previous performance by setting feature gates once and
running tests in parallel again.
2026-02-27 07:46:19 +01:00
Mads Jensen
d11d54dc50 Update pkg/controller to use slices.Contains 2026-02-26 10:17:13 +01:00
Karthik Bhat
43bfd8615d Refactor NewTestContext to return Context instead of TContext 2026-02-26 11:27:26 +05:30
Kubernetes Prow Robot
7ad86d14df
Merge pull request #137243 from michaelasp/fixJobClear
Fix clearing job consistency store for all deletes
2026-02-26 01:12:23 +05:30
Michael Aspinwall
f18f0df7fe Add the ability for the replicaset controller to read its own writes 2026-02-25 17:15:53 +00:00
Michael Aspinwall
008b92e0f6 Fix clearing job consistency store for all deletes 2026-02-25 17:13:50 +00:00
Kubernetes Prow Robot
c6d1649721
Merge pull request #137226 from tchap/selinuxwarning-reverse-index
controller/selinuxwarning/cache: Add reverse index to speed up DeletePod
2026-02-25 21:16:34 +05:30
Kubernetes Prow Robot
9f65538a35
Merge pull request #137224 from tchap/conflicts-parsed
controller/selinuxwarning: Pre-parse SELinux label
2026-02-25 16:27:50 +05:30
Ondra Kupka
911a61d050 controller/selinuxwarning/cache: Add reverse index
Added podToVolumes reverse index to optimize DeletePod.
Currently we simply iterate through all the volumes and remove the pod
being deleted from there. This is inefficient and takes longer the
longer the volume list becomes.

Keeping a map pod -> volumes makes removing a pod fast. We can just jump
to the relevant volumes directly and remove the pod from there.
2026-02-25 11:38:50 +01:00
Michael Aspinwall
61d0dd30fb Add the ability for the job controller to read its own writes 2026-02-25 01:19:48 +00:00
Ondra Kupka
a34456319d controller/selinuxwarning: Pre-parse SELinux label
When calling ControllerSELinuxTranslator.Conflicts(), the SELinux label
is repeatedly split into []string to detect conflicts. This causes a huge
number of allocations when there are many comparisons.

This is now made more efficient by pre-parsing the SELinux label and
storing it in podInfo as [4]string for fast comparison when needed.
2026-02-24 18:08:36 +01:00
Kubernetes Prow Robot
8812ec563c
Merge pull request #134353 from skitt/drop-string-slice
Deprecate obsolete slice utility functions
2026-02-20 00:57:41 +05:30
Michael Aspinwall
65eb0e94c2 Daemonset Consistency
Add the ability for the daemonset controller to figure out whether it has read its own writes for pods and daemonset objects.
2026-02-19 16:53:19 +00:00
Stephen Kitt
d42d1e3d1f
Deprecate obsolete slice utility functions
... and update users to use standard library functions.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2026-02-16 10:04:33 +01:00
胡玮文
6b4a37194a Do not get PV for externally deleting volume
Currently, we get each released PV every 15s, and in parallel. If there are a lot of released PV and we cannot finish all the get in 15s, it will starve other request by making the queue waiting for client-side throttling very long.

Even in a normal cluster, these requests are taking majority of all get requests from KCM (58% or 470 qps) in our stress test.
2026-02-16 08:51:08 +08:00
Alessio Attilio
ae3163eeca Fix issue 136525: Handle NotFound errors in garbage collector
When objects are deleted externally (e.g., pods deleted directly after
job deletion), the garbage collector should not log errors. This change
adds explicit NotFound error handling in attemptToDeleteItem to enqueue
virtual delete events when deleteObject returns NotFound, treating
external deletion as a successful outcome.

This follows the same pattern already used when getObject returns
NotFound, ensuring consistency across the function.
2026-02-13 00:01:22 +01:00
Kubernetes Prow Robot
d7f6f91dae
Merge pull request #135820 from pohly/dra-sharing-claim-sequentially-test
DRA: sharing claim sequentially test
2026-02-13 01:50:09 +05:30
Kubernetes Prow Robot
98dd4d8e60
Merge pull request #136812 from rpb-ant/rpb/sts-not-found
Add 404 handling for the statefulset controller pod deletion codepath
2026-02-13 00:18:00 +05:30