Commit graph

21 commits

Author SHA1 Message Date
Michael Aspinwall
d274e05cc9 Remove CRD stored versions from status upon SVM migration 2026-03-11 00:50:27 +00:00
Patrick Ohly
b895ce734f golangci-lint: bump to logtools v0.10.1
This fixes a bug that caused log calls involving `klog.Logger` to not be
checked.

As a result we have to fix some code that is now considered faulty:

    ERROR: pkg/controller/serviceaccount/tokens_controller.go:382:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (e *TokensController) generateTokenIfNeeded(ctx context.Context, logger klog.Logger, serviceAccount *v1.ServiceAccount, cachedSecret *v1.Secret) ( /* retry */ bool, error) {
    ERROR: ^
    ERROR: pkg/controller/storageversionmigrator/storageversionmigrator.go:299:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (svmc *SVMController) runMigration(ctx context.Context, logger klog.Logger, gvr schema.GroupVersionResource, resourceMonitor *garbagecollector.Monitor, toBeProcessedSVM *svmv1beta1.StorageVersionMigration, listResourceVersion string) (err error, failed bool) {
    ERROR: ^
    ERROR: pkg/proxy/node.go:121:3: logging function "Error" should not use format specifier "%q" (logcheck)
    ERROR: 		klog.FromContext(ctx).Error(nil, "Timed out waiting for node %q to exist", nodeName)
    ERROR: 		^
    ERROR: pkg/proxy/node.go:123:3: logging function "Error" should not use format specifier "%q" (logcheck)
    ERROR: 		klog.FromContext(ctx).Error(nil, "Timed out waiting for node %q to be assigned IPs", nodeName)
    ERROR: 		^
    ERROR: pkg/scheduler/backend/queue/scheduling_queue.go:610:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (p *PriorityQueue) runPreEnqueuePlugin(ctx context.Context, logger klog.Logger, pl fwk.PreEnqueuePlugin, pInfo *framework.QueuedPodInfo, shouldRecordMetric bool) *fwk.Status {
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:286:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) deleteClaim(ctx context.Context, claim *resourceapi.ResourceClaim, logger klog.Logger) error {
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:499:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) waitForExtendedClaimInAssumeCache(
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:528:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) createExtendedResourceClaimInAPI(
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:592:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) unreserveExtendedResourceClaim(ctx context.Context, logger klog.Logger, pod *v1.Pod, state *stateData) {
    ERROR: ^
    ERROR: pkg/scheduler/framework/runtime/batch.go:171:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (b *OpportunisticBatch) batchStateCompatible(ctx context.Context, logger klog.Logger, pod *v1.Pod, signature fwk.PodSignature, cycleCount int64, state fwk.CycleState, nodeInfos fwk.NodeInfoLister) bool {
    ERROR: ^
    ERROR: staging/src/k8s.io/component-base/featuregate/feature_gate.go:890:4: Additional arguments to Info should always be Key Value pairs. Please check if there is any key or value missing. (logcheck)
    ERROR: 			logger.Info("Warning: SetEmulationVersionAndMinCompatibilityVersion will change already queried feature", "featureGate", feature, "oldValue", oldVal, newVal)
    ERROR: 			^
    ERROR: test/images/sample-device-plugin/sampledeviceplugin.go:108:2: logging function "Info" should not use format specifier "%s" (logcheck)
    ERROR: 	logger.Info("pluginSocksDir: %s", pluginSocksDir)
    ERROR: 	^
    ERROR: test/images/sample-device-plugin/sampledeviceplugin.go:123:2: logging function "Info" should not use format specifier "%s" (logcheck)
    ERROR: 	logger.Info("CDI_ENABLED: %s", cdiEnabled)
    ERROR: 	^

While waiting for this to merge, another call was added which also doesn't
follow conventions:

    ERROR: pkg/kubelet/kubelet.go:2454:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (kl *Kubelet) deletePod(ctx context.Context, logger klog.Logger, pod *v1.Pod) error {
    ERROR: ^

Contextual logging has been beta and enabled by default for several releases
now. It's mostly just a matter of wrapping up and declaring it GA. Therefore
the calls which directly call WithName or WithValues (always have an effect)
are left as-is instead of converting them to use the klog wrappers (support
disabling the effect). To allow that, the linter gets reconfigured to not
complain about this anymore, anywhere.

The calls which would have to be fixed otherwise are:

    ERROR: pkg/kubelet/cm/dra/claiminfo.go:170:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-claiminfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/healthinfo.go:45:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-healthinfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/healthinfo.go:89:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-healthinfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/healthinfo.go:157:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-healthinfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/manager.go:175:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:239:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:593:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:781:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(context.Background()).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:898:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager_test.go:1638:15: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 				logger := klog.FromContext(streamCtx).WithName(st.Name())
    ERROR: 				          ^
    ERROR: pkg/kubelet/cm/dra/plugin/dra_plugin.go:77:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-plugin")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/plugin/dra_plugin.go:108:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-plugin")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/plugin/dra_plugin.go:161:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-plugin")
    ERROR: 	          ^
    ERROR: staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker.go:695:14: function "WithValues" should be called through klogr.LoggerWithValues (logcheck)
    ERROR: 			logger := logger.WithValues("device", deviceID)
    ERROR: 			          ^
    ERROR: test/integration/apiserver/watchcache_test.go:42:54: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	etcd0URL, stopEtcd0, err := framework.RunCustomEtcd(klog.FromContext(ctx).WithName("etcd0"), "etcd_watchcache0", etcdArgs)
    ERROR: 	                                                    ^
    ERROR: test/integration/apiserver/watchcache_test.go:47:54: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	etcd1URL, stopEtcd1, err := framework.RunCustomEtcd(klog.FromContext(ctx).WithName("etcd1"), "etcd_watchcache1", etcdArgs)
    ERROR: 	                                                    ^
    ERROR: test/integration/scheduler_perf/scheduler_perf.go:1149:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 		logger = logger.WithName(tCtx.Name())
    ERROR: 		         ^
2026-03-04 12:08:18 +01:00
Tanner
bb8b4b0d80 Correct the usage of vlog's .Error() or .V().Info() methods 2026-01-06 20:31:04 -05:00
Kubernetes Prow Robot
41673c7198
Merge pull request #134910 from tchap/kcm-controllers-thread-mgmt
pkg/controller: Improve goroutine management
2025-11-03 17:58:03 -08:00
Kubernetes Prow Robot
3ec2d82da5
Merge pull request #134784 from michaelasp/svm_beta2
SVM: bump the API to beta, remove unused fields
2025-10-29 13:56:02 -07:00
Michael Aspinwall
3b72759d1b Update SVM to Beta
Co-authored-by: Stanislav Láznička <stlaz.devel@proton.me>
2025-10-29 19:36:11 +00:00
Ondra Kupka
821a3f5aff controller/storageversionmigrator: Improve goroutine mgmt
Make sure all threads are terminated when Run returns.
2025-10-29 19:04:37 +01:00
Kubernetes Prow Robot
82e2a3e117
Merge pull request #134763 from jpbetz/api-machinery-controller-owners
Clean up OWNERS for all api machinery owned controllers
2025-10-24 12:15:34 -07:00
Joe Betz
0e7d42dd45 Clean up OWNERS for all api machinery owned controllers 2025-10-24 13:38:10 -04:00
Kubernetes Prow Robot
3c4a7184d8
Merge pull request #134231 from michaelasp/svmRobustness
feat: Improve the robustness of SVM
2025-10-24 10:09:38 -07:00
Joe Betz
d54f3b56ed
Add OWNERS for controller/storageversionmigrator 2025-10-21 17:06:53 -04:00
Michael Aspinwall
4616d749a3 feat: Improve the robustness of SVM
Add handling for retriable errors, terminal cases and unit tests for SVM. Also move the comparisons of resource version to use the built in helper provided by apimachinery.
2025-10-09 15:13:05 +00:00
Kubernetes Prow Robot
d03d25f47c
Merge pull request #133985 from aditigupta96/api-waitfornamedcachesync-with-context
Replace WaitForNamedCacheSync with WaitForNamedCacheSyncWithContext in pkg/controller/
2025-09-16 17:18:09 -07:00
Aditi Gupta
af231d2153 Replace WaitForNamedCacheSync with WaitForNamedCacheSyncWithContext in pkg/controller/ 2025-09-16 14:51:34 -07:00
Michael Aspinwall
1a0813598b Update SVM Discovery checks in response to jpbetz and stlaz 2025-09-05 20:33:05 +00:00
Michael Aspinwall
21359d7b1f Switch to resourceVersion controller 2025-09-04 18:17:00 +00:00
Michael Aspinwall
e1218922db Add unit tests to isResourceUpdatable 2025-08-28 21:04:59 +00:00
Michael Aspinwall
3bdaeea215 feat: Add discovery check to SVM to ensure migration doesn't get stuck 2025-08-20 16:32:15 +00:00
Monis Khan
6a6771b514
svm: set UID and RV on SSA patch to cause conflict on logical create
When a resource gets deleted during migration, the SVM SSA patch
calls are interpreted as a logical create request.  Since the object
from storage is nil, the merged result is just a type meta object,
which lacks a name in the body.  This fails when the API server
checks that the name from the request URL and the body are the same.
Note that a create request is something that SVM controller should
never do.

Once the UID is set on the patch, the API server will fail the
request at a slightly earlier point with an "uid mismatch" conflict
error, which the SVM controller can handle gracefully.

Setting UID by itself is not sufficient.  When a resource gets
deleted and recreated, if RV is not set but UID is set, we would get
an immutable field validation error for attempting to update the
UID.  To address this, we set the resource version on the SSA patch
as well.  This will cause that update request to also fail with a
conflict error.

Added the create verb on all resources for SVM controller RBAC as
otherwise the API server will reject the request before it fails
with a conflict error.

The change addresses a host of other issues with the SVM controller:

1. Include failure message in SVM resource
2. Do not block forever on unsynced GC monitor
3. Do not immediately fail on GC monitor being missing, allow for
   a grace period since discovery may be out of sync
4. Set higher QPS and burst to handle large migrations

Test changes:

1. Clean up CRD webhook convertor logs
2. Allow SVM tests to be run multiple times to make finding flakes easier
3. Create and delete CRs during CRD test to force out any flakes
4. Add a stress test with multiple parallel migrations
5. Enable RBAC on KAS
6. Run KCM directly to exercise wiring and RBAC
7. Better logs during CRD migration
8. Scan audit logs to confirm SVM controller never creates

Signed-off-by: Monis Khan <mok@microsoft.com>
2024-07-18 17:19:11 -04:00
Alvaro Aleman
6d0ac8c561 Use the generic/typed workqueue throughout
This change makes us use the generic workqueue throughout the project in
order to improve type safety and readability of the code.
2024-05-04 14:33:12 -04:00
Nilekh Chaudhari
9161302e7f
feat: implements svm controller
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>
2024-03-08 19:25:10 +00:00