kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-06-23 00:12:09 -04:00

Author	SHA1	Message	Date
dom4ha	88948acc38	Remove v1alpha2 API definitions Update client-go lister and informer imports to v1alpha3	2026-05-22 12:50:19 +00:00
dom4ha	8a52fb2ea9	Migrate references to v1alpha3 in tests, controllers, and remaining files	2026-05-22 12:50:19 +00:00
Patrick Ohly	4a305f8fc7	DRA: fix component list for ResourceClaim metric The endpoint-mappings.yaml file specifies which components use which metrics. The case some, but not all core components (kube-controller-manager and kube-scheduler in this case) sharing the same metrics was not supported. This gets fixed by not returning early once the first file path matches. Not all metrics in pkg/controller/resourceclaim/metrics are shared. To make the sharing clearer and fit into the file-path based component support in endpoint-mappings.yaml, the shared metric gets moved to a new pkg/resourceclaim/metrics package.	2026-05-11 12:31:45 +02:00
John Belamaric	57aae64982	Fix flapping pod.status.resourceClaimStatuses resourceclaimcontroller: fix incorrect SSA apply in syncPod method The ResourceClaimController's syncPod method only includes new resource claims in the server-side apply, not existing claims. Since this controller is the owning fieldManager, SSA removes the missing existing keys. This results in flapping between claims when more than one claim is assigned to the Pod. This fix includes the existing claims in the SSA request. Signed-off-by: John Belamaric <jbelamaric@google.com>	2026-04-17 14:56:18 +00:00
Jon Huhn	d80f384b70	Workload API: PodGroup ResourceClaims (KEP-5729)	2026-03-22 14:52:45 -05:00
Alay Patel	b9729e8197	kep-5304: add DeviceMetadata API	2026-03-18 08:29:42 -04:00
Patrick Ohly	bff684d951	DRA ResourceClaim controller: update logging This provides a bit more information when the controller touches a ResourceClaim.	2026-02-12 12:33:22 +01:00
MohammedSaalif	4925c6bea4	DRA: support non-pod references in ReservedFor (#136450 ) * DRA: support non-pod references in ReservedFor Signed-off-by: MohammedSaalif <salifud2004@gmail.com> * Expand reservation validation comment in syncClaim as suggested by mortent * Address feedback: rename valid to remaining and remove obsolete TODO --------- Signed-off-by: MohammedSaalif <salifud2004@gmail.com>	2026-01-25 00:28:13 +05:30
Goend	13e46ffc45	Fix the issue of slow creation of ResourceClaim in specific scenarios	2026-01-06 19:18:58 +08:00
Ayato Tokubi	320987ead3	Addressed comments	2025-11-05 10:44:50 +00:00
Ayato Tokubi	5102591a6b	Refactor resource claim metrics to use structured labels and add "source" dimension. Signed-off-by: Ayato Tokubi <atokubi@redhat.com>	2025-11-05 09:52:47 +00:00
Kubernetes Prow Robot	41673c7198	Merge pull request #134910 from tchap/kcm-controllers-thread-mgmt pkg/controller: Improve goroutine management	2025-11-03 17:58:03 -08:00
yliao	4f647b3f3d	removed BlockOwnerDeletion	2025-10-29 22:41:10 +00:00
Ondra Kupka	63c15cbe83	controller/resourceclaim: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-10-29 19:07:10 +01:00
Mayank Agrawal	5e216ae34d	Replace HandleCrash and HandleError calls to use context-aware alternatives	2025-10-07 22:40:10 -07:00
Alay Patel	8a03067211	fix resource claims deallocation for extended resource when pod is completed Signed-off-by: Alay Patel <alayp@nvidia.com>	2025-09-29 15:15:40 -04:00
Aditi Gupta	af231d2153	Replace WaitForNamedCacheSync with WaitForNamedCacheSyncWithContext in pkg/controller/	2025-09-16 14:51:34 -07:00
Patrick Ohly	5c4f81743c	DRA: use v1 API As before when adding v1beta2, DRA drivers built using the k8s.io/dynamic-resource-allocation helper packages remain compatible with all Kubernetes release >= 1.32. The helper code picks whatever API version is enabled from v1beta1/v1beta2/v1. However, the control plane now depends on v1, so a cluster configuration where only v1beta1 or v1beta2 are enabled without the v1 won't work.	2025-07-24 08:33:45 +02:00
Rita Zhang	d42a1d58d0	DRAAdminAccess: add metrics Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>	2025-07-18 07:15:41 -07:00
Jon Huhn	ef117edf35	DRA: fix deleting orphaned ResourceClaim on startup	2025-06-25 11:11:43 -05:00
Morten Torkildsen	36d8a44b9c	DRA: Update controller for Prioritized Alternatives in Device Requests	2025-02-28 19:32:59 +00:00
Patrick Ohly	4638ba9716	client-go/tools/cache: add APIs with context parameter The context is used for cancellation and to support contextual logging. In most cases, alternative *WithContext APIs get added, except for NewIntegerResourceVersionMutationCache where code searches indicate that the API is not used downstream. An API break around SharedInformer couldn't be avoided because the alternative (keeping the interface unchanged and adding a second one with the new method) would have been worse. controller-runtime needs to be updated because it implements that interface in a test package. Downstream consumers of controller-runtime will work unless they use those test package. Converting Kubernetes to use the other new alternatives will follow. In the meantime, usage of the new alternatives cannot be enforced via logcheck yet (see https://github.com/kubernetes/kubernetes/issues/126379 for the process). Passing context through and checking it for cancellation is tricky for event handlers. A better approach is to map the context cancellation to the normal removal of an event handler via a helper goroutine. Thanks to the new HandleErrorWithLogr and HandleCrashWithLogr, remembering the logger is sufficient for handling problems at runtime.	2024-12-18 18:45:02 +01:00
Patrick Ohly	33ea278c51	DRA: use v1beta1 API No code is left which depends on the v1alpha3, except of course the code implementing that version.	2024-11-06 13:03:19 +01:00
Patrick Ohly	4419568259	DRA: treat AdminAccess as a new feature gated field Using the "normal" logic for a feature gated field simplifies the implementation of the feature gate. There is one (entirely theoretic!) problem with updating from 1.31: if a claim was allocated in 1.31 with admin access, the status field was not set because it didn't exist yet. If a driver now follows the current definition of "unset = off", then it will not grant admin access even though it should. This is theoretic because drivers are starting to support admin access with 1.32, so there shouldn't be any claim where this problem could occur.	2024-10-29 10:22:31 +01:00
Patrick Ohly	9a7e4ccab2	DRA admin access: add feature gate The new DRAAdminAccess feature gate has the following effects: - If disabled in the apiserver, the spec.devices.requests[*].adminAccess field gets cleared. Same in the status. In both cases the scenario that it was already set and a claim or claim template get updated is special: in those cases, the field is not cleared. Also, allocating a claim with admin access is allowed regardless of the feature gate and the field is not cleared. In practice, the scheduler will not do that. - If disabled in the resource claim controller, creating ResourceClaims with the field set gets rejected. This prevents running workloads which depend on admin access. - If disabled in the scheduler, claims with admin access don't get allocated. The effect is the same. The alternative would have been to ignore the fields in claim controller and scheduler. This is bad because a monitoring workload then runs, blocking resources that probably were meant for production workloads.	2024-10-29 09:50:11 +01:00
Patrick Ohly	c2524cbf9b	DRA resourceclaims: maintain metric of total and allocated claims These metrics can provide insights into ResourceClaim usage. The total count is redundant because the apiserver also provides count of resources, but having it in the same sub-system next to the count of allocated claims might be more discoverable and helps monitor the controller itself.	2024-10-18 09:13:42 +02:00
Kubernetes Prow Robot	b1b4e5d397	Merge pull request #128003 from pohly/dra-classic-dra-removal DRA: remove "classic DRA"	2024-10-18 00:55:17 +01:00
Patrick Ohly	d572df2493	DRA resource claim controller: improve log messages Some code paths didn't log anything. One log message about "claim got deleted" was incorrect.	2024-10-17 18:28:55 +02:00
Patrick Ohly	f84eb5ecf8	DRA: remove "classic DRA" This removes the DRAControlPlaneController feature gate, the fields controlled by it (claim.spec.controller, claim.status.deallocationRequested, claim.status.allocation.controller, class.spec.suitableNodes), the PodSchedulingContext type, and all code related to the feature. The feature gets removed because there is no path towards beta and GA and DRA with "structured parameters" should be able to replace it.	2024-10-16 23:09:50 +02:00
Kevin Hannon	03da672159	remove 1.27 deterministic support for resource claims	2024-09-18 08:25:06 -04:00
Patrick Ohly	de5742ae83	DRA: remove immediate allocation As agreed in https://github.com/kubernetes/enhancements/pull/4709, immediate allocation is one of those features which can be removed because it makes no sense for structured parameters and the justification for classic DRA is weak.	2024-07-21 17:28:14 +02:00
Patrick Ohly	b51d68bb87	DRA: bump API v1alpha2 -> v1alpha3 This is in preparation for revamping the resource.k8s.io completely. Because there will be no support for transitioning from v1alpha2 to v1alpha3, the roundtrip test data for that API in 1.29 and 1.30 gets removed. Repeating the version in the import name of the API packages is not really required. It was done for a while to support simpler grepping for usage of alpha APIs, but there are better ways for that now. So during this transition, "resourceapi" gets used instead of "resourcev1alpha3" and the version gets dropped from informer and lister imports. The advantage is that the next bump to v1beta1 will affect fewer source code lines. Only source code where the version really matters (like API registration) retains the versioned import.	2024-07-21 17:28:13 +02:00
Kubernetes Prow Robot	ac9aec9f9b	Merge pull request #125116 from pohly/dra-one-of-source DRA: remove "source" indirection from v1 Pod API	2024-06-28 12:46:45 -07:00
Patrick Ohly	bde9b64cdf	DRA: remove "source" indirection from v1 Pod API This makes the API nicer: resourceClaims: - name: with-template resourceClaimTemplateName: test-inline-claim-template - name: with-claim resourceClaimName: test-shared-claim Previously, this was: resourceClaims: - name: with-template source: resourceClaimTemplateName: test-inline-claim-template - name: with-claim source: resourceClaimName: test-shared-claim A more long-term benefit is that other, future alternatives might not make sense under the "source" umbrella. This is a breaking change. It's justified because DRA is still alpha and will have several other API breaks in 1.31.	2024-06-27 17:53:24 +02:00
Kubernetes Prow Robot	92e0db2bbf	Merge pull request #125640 from googs1025/resourceclaim_controller_log_fix1 added resourceclaim_controller log info	2024-06-27 03:20:10 -07:00
googs1025	5f8fb17652	added resourceclaim_controller log info Signed-off-by: googs1025 <googs1025@gmail.com>	2024-06-26 18:38:11 +08:00
Patrick Ohly	2da9e660e3	resourceclaim controller: add missing log output The logging was fairly complete about not doing something, but the actual ResourceClaim creation was not logged.	2024-06-25 16:12:31 +02:00
liyuerich	8e97c0ff7d	drop deprecated pointer package in controller Signed-off-by: liyuerich <yue.li@daocloud.io> Update job_controller.go Signed-off-by: liyuerich <yue.li@daocloud.io>	2024-05-09 11:34:25 +08:00
Kubernetes Prow Robot	1dc30bf90f	Merge pull request #124600 from alvaroaleman/typed-wq Use the generic/typed workqueue throughout	2024-05-06 16:18:31 -07:00
carlory	76aa289608	bugfix: resourceclaim forgot to wait for podSchedulingSynced and templatesSynced	2024-05-06 16:56:16 +08:00
Alvaro Aleman	6d0ac8c561	Use the generic/typed workqueue throughout This change makes us use the generic workqueue throughout the project in order to improve type safety and readability of the code.	2024-05-04 14:33:12 -04:00
Xuzheng Chang	3e08030d53	fix wrong comments of dra Signed-off-by: Xuzheng Chang <changxuzheng@huawei.com>	2024-04-09 09:41:25 +08:00
Patrick Ohly	3de376ecf6	dra controller: support structured parameters When allocation was done by the scheduler, the controller needs to do the deallocation because there is no control-plane controller which could react to "DeallocationRequested".	2024-03-07 22:22:13 +01:00
Mengjiao Liu	b584b87a94	kube-controller-manager: readjust log verbosity - Increase the global level for broadcaster's logging to 3 so that users can ignore event messages by lowering the logging level. It reduces information noise. - Making sure the context is properly injected into the broadcaster, this will allow the -v flag value to be used also in that broadcaster, rather than the above global value. - test: use cancellation from ktesting - golangci-hints: checked error return value	2024-02-26 14:51:56 +08:00
Patrick Ohly	3c2cfd9a4f	resource claim controller: separate generated suffix from base When the resource claim name inside the pod had some suffix like "1a" in "resource-1a", the generated name suffix got added directly after that, leading to "my-pod-resource-1ax6zgt". Adding another hyphen makes the result more readable: "my-pod-resource-1a-x6zgt".	2023-09-04 09:45:25 +02:00
Patrick Ohly	80ab8f0542	dra: handle scheduled pods in kube-controller-manager When someone decides that a Pod should definitely run on a specific node, they can create the Pod with spec.nodeName already set. Some custom scheduler might do that. Then kubelet starts to check the pod and (if DRA is enabled) will refuse to run it, either because the claims are still waiting for the first consumer or the pod wasn't added to reservedFor. Both are things the scheduler normally does. Also, if a pod got scheduled while the DRA feature was off in the kube-scheduler, a pod can reach the same state. The resource claim controller can handle these two cases by taking over for the kube-scheduler when nodeName is set. Triggering an allocation is simpler than in the scheduler because all it takes is creating the right PodSchedulingContext with spec.selectedNode set. There's no need to list nodes because that choice was already made, permanently. Adding the pod to reservedFor also isn't hard. What's currently missing is triggering de-allocation of claims to re-allocate them for the desired node. This is not important for claims that get created for the pod from a template and then only get used once, but it might be worthwhile to add de-allocation in the future.	2023-07-13 21:27:11 +02:00
Patrick Ohly	5cec6d798c	dra: revamp event handlers in kube-controller-manager Enabling logging is useful to track what the code is doing. There are some functional changes: - The pod handler checks for existence of claims. This avoids adding pods to the work queue in more cases when nothing needs to be done, at the cost of making the event handlers a bit slower. This will become more important when adding more work to the controller - The handler for deleted ResourceClaim did not check for cache.DeletedFinalStateUnknown.	2023-07-13 21:27:11 +02:00
Patrick Ohly	98ba89d31d	resourceclaim controller: avoid caching deleted pod unnecessarily We don't need to remember that a pod got deleted when it had no resource claims because the code which checks the cached UIDs only checks for pods which have resource claims.	2023-07-12 16:57:17 +02:00
Patrick Ohly	fec25785ee	dra: store generated ResourceClaims in cache This addresses the following bad sequence of events: - controller creates ResourceClaim - updating pod status fails - pod gets retried before the informer receives the created ResourceClaim - another ResourceClaim gets created Storing the generated ResourceClaim in a MutationCache ensures that the controller knows about it during the retry. A positive side effect is that ResourceClaims now get index by pod owner and thus iterating over existing ones becomes a bit more efficient.	2023-07-11 14:23:49 +02:00
Patrick Ohly	444d23bd2f	dra: generated name for ResourceClaim from template Generating the name avoids all potential name collisions. It's not clear how much of a problem that was because users can avoid them and the deterministic names for generic ephemeral volumes have not led to reports from users. But using generated names is not too hard either. What makes it relatively easy is that the new pod.status.resourceClaimStatus map stores the generated name for kubelet and node authorizer, i.e. the information in the pod is sufficient to determine the name of the ResourceClaim. The resource claim controller becomes a bit more complex and now needs permission to modify the pod status. The new failure scenario of "ResourceClaim created, updating pod status fails" is handled with the help of a new special "resource.kubernetes.io/pod-claim-name" annotation that together with the owner reference identifies exactly for what a ResourceClaim was generated, so updating the pod status can be retried for existing ResourceClaims. The transition from deterministic names is handled with a special case for that recovery code path: a ResourceClaim with no annotation and a name that follows the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod claim and gets added to the pod status. There's no immediate need for it, but just in case that it may become relevant, the name of the generated ResourceClaim may also be left unset to record that no claim was needed. Components processing such a pod can skip whatever they normally would do for the claim. To ensure that they do and also cover other cases properly ("no known field is set", "must check ownership"), resourceclaim.Name gets extended.	2023-07-11 14:23:48 +02:00

1 2

63 commits