kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-02-14 16:25:09 -05:00

Author	SHA1	Message	Date
Anson Qian	a816a7b1d8	Make ConcurrentResourceClaimSyncs configurable (#134701 ) * DRA resource claim controller: configurable number of workers It might never be necessary to change the default, but it is hard to be sure. It's better to have the option, just in case. * generate files * resourceclaimcontroller: normalize validation error message * Update cmd/kube-controller-manager/app/options/resourceclaimcontroller.go Co-authored-by: Jordan Liggitt <jordan@liggitt.net> --------- Co-authored-by: Patrick Ohly <patrick.ohly@intel.com> Co-authored-by: Jordan Liggitt <jordan@liggitt.net>	2026-01-08 19:31:39 +05:30
Goend	13e46ffc45	Fix the issue of slow creation of ResourceClaim in specific scenarios	2026-01-06 19:18:58 +08:00
Ayato Tokubi	320987ead3	Addressed comments	2025-11-05 10:44:50 +00:00
Ayato Tokubi	5102591a6b	Refactor resource claim metrics to use structured labels and add "source" dimension. Signed-off-by: Ayato Tokubi <atokubi@redhat.com>	2025-11-05 09:52:47 +00:00
Kubernetes Prow Robot	41673c7198	Merge pull request #134910 from tchap/kcm-controllers-thread-mgmt pkg/controller: Improve goroutine management	2025-11-03 17:58:03 -08:00
yliao	4f647b3f3d	removed BlockOwnerDeletion	2025-10-29 22:41:10 +00:00
Ondra Kupka	63c15cbe83	controller/resourceclaim: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-10-29 19:07:10 +01:00
Mayank Agrawal	5e216ae34d	Replace HandleCrash and HandleError calls to use context-aware alternatives	2025-10-07 22:40:10 -07:00
Alay Patel	8a03067211	fix resource claims deallocation for extended resource when pod is completed Signed-off-by: Alay Patel <alayp@nvidia.com>	2025-09-29 15:15:40 -04:00
Kubernetes Prow Robot	d7bd2b0343	Merge pull request #134030 from richabanker/update-metrics-docs Update metrics docs list for v1.34	2025-09-18 08:04:15 -07:00
Aditi Gupta	af231d2153	Replace WaitForNamedCacheSync with WaitForNamedCacheSyncWithContext in pkg/controller/	2025-09-16 14:51:34 -07:00
Richa Banker	c51a8734b1	Update documented metrics list	2025-09-16 11:52:14 -07:00
Patrick Ohly	5c4f81743c	DRA: use v1 API As before when adding v1beta2, DRA drivers built using the k8s.io/dynamic-resource-allocation helper packages remain compatible with all Kubernetes release >= 1.32. The helper code picks whatever API version is enabled from v1beta1/v1beta2/v1. However, the control plane now depends on v1, so a cluster configuration where only v1beta1 or v1beta2 are enabled without the v1 won't work.	2025-07-24 08:33:45 +02:00
Rita Zhang	d42a1d58d0	DRAAdminAccess: add metrics Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>	2025-07-18 07:15:41 -07:00
Jon Huhn	f1845218e2	fixup! DRA: fix deleting orphaned ResourceClaim on startup	2025-06-26 23:21:18 -05:00
Jon Huhn	ef117edf35	DRA: fix deleting orphaned ResourceClaim on startup	2025-06-25 11:11:43 -05:00
Rita Zhang	0301e5a9f8	DRA: AdminAccess validate based on namespace label Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>	2025-03-18 22:56:54 -07:00
Morten Torkildsen	36d8a44b9c	DRA: Update controller for Prioritized Alternatives in Device Requests	2025-02-28 19:32:59 +00:00
Patrick Ohly	4638ba9716	client-go/tools/cache: add APIs with context parameter The context is used for cancellation and to support contextual logging. In most cases, alternative *WithContext APIs get added, except for NewIntegerResourceVersionMutationCache where code searches indicate that the API is not used downstream. An API break around SharedInformer couldn't be avoided because the alternative (keeping the interface unchanged and adding a second one with the new method) would have been worse. controller-runtime needs to be updated because it implements that interface in a test package. Downstream consumers of controller-runtime will work unless they use those test package. Converting Kubernetes to use the other new alternatives will follow. In the meantime, usage of the new alternatives cannot be enforced via logcheck yet (see https://github.com/kubernetes/kubernetes/issues/126379 for the process). Passing context through and checking it for cancellation is tricky for event handlers. A better approach is to map the context cancellation to the normal removal of an event handler via a helper goroutine. Thanks to the new HandleErrorWithLogr and HandleCrashWithLogr, remembering the logger is sufficient for handling problems at runtime.	2024-12-18 18:45:02 +01:00
Patrick Ohly	33ea278c51	DRA: use v1beta1 API No code is left which depends on the v1alpha3, except of course the code implementing that version.	2024-11-06 13:03:19 +01:00
Davanum Srinivas	2b0592ee77	Use k8s.io/utils/lru instead of github.com/golang/groupcache/lru Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-11-04 10:51:13 -05:00
Kubernetes Prow Robot	daef8c2419	Merge pull request #127266 from pohly/dra-admin-access-in-status DRA API: AdminAccess in DeviceRequestAllocationResult + DRAAdminAccess feature gate	2024-10-30 03:41:25 +00:00
Kubernetes Prow Robot	c5ccf59974	Merge pull request #128379 from pohly/dra-owners-wg-label DRA: add wg/device-management label automatically	2024-10-29 15:24:57 +00:00
Patrick Ohly	4419568259	DRA: treat AdminAccess as a new feature gated field Using the "normal" logic for a feature gated field simplifies the implementation of the feature gate. There is one (entirely theoretic!) problem with updating from 1.31: if a claim was allocated in 1.31 with admin access, the status field was not set because it didn't exist yet. If a driver now follows the current definition of "unset = off", then it will not grant admin access even though it should. This is theoretic because drivers are starting to support admin access with 1.32, so there shouldn't be any claim where this problem could occur.	2024-10-29 10:22:31 +01:00
Patrick Ohly	9a7e4ccab2	DRA admin access: add feature gate The new DRAAdminAccess feature gate has the following effects: - If disabled in the apiserver, the spec.devices.requests[*].adminAccess field gets cleared. Same in the status. In both cases the scenario that it was already set and a claim or claim template get updated is special: in those cases, the field is not cleared. Also, allocating a claim with admin access is allowed regardless of the feature gate and the field is not cleared. In practice, the scheduler will not do that. - If disabled in the resource claim controller, creating ResourceClaims with the field set gets rejected. This prevents running workloads which depend on admin access. - If disabled in the scheduler, claims with admin access don't get allocated. The effect is the same. The alternative would have been to ignore the fields in claim controller and scheduler. This is bad because a monitoring workload then runs, blocking resources that probably were meant for production workloads.	2024-10-29 09:50:11 +01:00
Patrick Ohly	9d1b0654e0	DRA: add wg/device-management label automatically This makes PRs show up automatically in the WG's project board (https://github.com/orgs/kubernetes/projects/95/views/1).	2024-10-28 16:36:04 +01:00
Patrick Ohly	c2524cbf9b	DRA resourceclaims: maintain metric of total and allocated claims These metrics can provide insights into ResourceClaim usage. The total count is redundant because the apiserver also provides count of resources, but having it in the same sub-system next to the count of allocated claims might be more discoverable and helps monitor the controller itself.	2024-10-18 09:13:42 +02:00
Kubernetes Prow Robot	b1b4e5d397	Merge pull request #128003 from pohly/dra-classic-dra-removal DRA: remove "classic DRA"	2024-10-18 00:55:17 +01:00
Patrick Ohly	d572df2493	DRA resource claim controller: improve log messages Some code paths didn't log anything. One log message about "claim got deleted" was incorrect.	2024-10-17 18:28:55 +02:00
Patrick Ohly	f84eb5ecf8	DRA: remove "classic DRA" This removes the DRAControlPlaneController feature gate, the fields controlled by it (claim.spec.controller, claim.status.deallocationRequested, claim.status.allocation.controller, class.spec.suitableNodes), the PodSchedulingContext type, and all code related to the feature. The feature gets removed because there is no path towards beta and GA and DRA with "structured parameters" should be able to replace it.	2024-10-16 23:09:50 +02:00
Kevin Hannon	03da672159	remove 1.27 deterministic support for resource claims	2024-09-18 08:25:06 -04:00
Patrick Ohly	0fc78b9bcc	DRA resource claim controller: update test The resource claim controller is completely agnostic to the claim spec. It doesn't care about classes or devices, therefore it needs no changes in 1.31 besides the v1alpha2 -> v1alpha3 renaming from a previous commit.	2024-07-22 18:09:34 +02:00
Patrick Ohly	8a629b9f15	DRA: remove "sharable" from claim allocation result Now all claims are shareable up to the limit imposed by the size of the "reserverFor" array. This is one of the agreed simplifications for 1.31.	2024-07-21 17:28:14 +02:00
Patrick Ohly	de5742ae83	DRA: remove immediate allocation As agreed in https://github.com/kubernetes/enhancements/pull/4709, immediate allocation is one of those features which can be removed because it makes no sense for structured parameters and the justification for classic DRA is weak.	2024-07-21 17:28:14 +02:00
Patrick Ohly	b51d68bb87	DRA: bump API v1alpha2 -> v1alpha3 This is in preparation for revamping the resource.k8s.io completely. Because there will be no support for transitioning from v1alpha2 to v1alpha3, the roundtrip test data for that API in 1.29 and 1.30 gets removed. Repeating the version in the import name of the API packages is not really required. It was done for a while to support simpler grepping for usage of alpha APIs, but there are better ways for that now. So during this transition, "resourceapi" gets used instead of "resourcev1alpha3" and the version gets dropped from informer and lister imports. The advantage is that the next bump to v1beta1 will affect fewer source code lines. Only source code where the version really matters (like API registration) retains the versioned import.	2024-07-21 17:28:13 +02:00
Kubernetes Prow Robot	ac9aec9f9b	Merge pull request #125116 from pohly/dra-one-of-source DRA: remove "source" indirection from v1 Pod API	2024-06-28 12:46:45 -07:00
Patrick Ohly	bde9b64cdf	DRA: remove "source" indirection from v1 Pod API This makes the API nicer: resourceClaims: - name: with-template resourceClaimTemplateName: test-inline-claim-template - name: with-claim resourceClaimName: test-shared-claim Previously, this was: resourceClaims: - name: with-template source: resourceClaimTemplateName: test-inline-claim-template - name: with-claim source: resourceClaimName: test-shared-claim A more long-term benefit is that other, future alternatives might not make sense under the "source" umbrella. This is a breaking change. It's justified because DRA is still alpha and will have several other API breaks in 1.31.	2024-06-27 17:53:24 +02:00
Kubernetes Prow Robot	92e0db2bbf	Merge pull request #125640 from googs1025/resourceclaim_controller_log_fix1 added resourceclaim_controller log info	2024-06-27 03:20:10 -07:00
googs1025	5f8fb17652	added resourceclaim_controller log info Signed-off-by: googs1025 <googs1025@gmail.com>	2024-06-26 18:38:11 +08:00
Patrick Ohly	2da9e660e3	resourceclaim controller: add missing log output The logging was fairly complete about not doing something, but the actual ResourceClaim creation was not logged.	2024-06-25 16:12:31 +02:00
liyuerich	8e97c0ff7d	drop deprecated pointer package in controller Signed-off-by: liyuerich <yue.li@daocloud.io> Update job_controller.go Signed-off-by: liyuerich <yue.li@daocloud.io>	2024-05-09 11:34:25 +08:00
Kubernetes Prow Robot	1dc30bf90f	Merge pull request #124600 from alvaroaleman/typed-wq Use the generic/typed workqueue throughout	2024-05-06 16:18:31 -07:00
carlory	76aa289608	bugfix: resourceclaim forgot to wait for podSchedulingSynced and templatesSynced	2024-05-06 16:56:16 +08:00
Alvaro Aleman	6d0ac8c561	Use the generic/typed workqueue throughout This change makes us use the generic workqueue throughout the project in order to improve type safety and readability of the code.	2024-05-04 14:33:12 -04:00
Kubernetes Prow Robot	eb2a59e8d8	Merge pull request #124214 from Monokaix/dev fix wrong comments of dra	2024-04-18 03:24:28 -07:00
Xuzheng Chang	3e08030d53	fix wrong comments of dra Signed-off-by: Xuzheng Chang <changxuzheng@huawei.com>	2024-04-09 09:41:25 +08:00
Patrick Ohly	4126e37f08	dra controller: unit tests	2024-03-22 10:03:22 +01:00
Patrick Ohly	3de376ecf6	dra controller: support structured parameters When allocation was done by the scheduler, the controller needs to do the deallocation because there is no control-plane controller which could react to "DeallocationRequested".	2024-03-07 22:22:13 +01:00
Mengjiao Liu	b584b87a94	kube-controller-manager: readjust log verbosity - Increase the global level for broadcaster's logging to 3 so that users can ignore event messages by lowering the logging level. It reduces information noise. - Making sure the context is properly injected into the broadcaster, this will allow the -v flag value to be used also in that broadcaster, rather than the above global value. - test: use cancellation from ktesting - golangci-hints: checked error return value	2024-02-26 14:51:56 +08:00
Patrick Ohly	3c2cfd9a4f	resource claim controller: separate generated suffix from base When the resource claim name inside the pod had some suffix like "1a" in "resource-1a", the generated name suffix got added directly after that, leading to "my-pod-resource-1ax6zgt". Adding another hyphen makes the result more readable: "my-pod-resource-1a-x6zgt".	2023-09-04 09:45:25 +02:00

1 2

69 commits