kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-02-15 00:37:52 -05:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	8ab1bc1633	Merge pull request #135725 from bart0sh/PR211-add-extended-resources-test-cases Fix extended resource handling for DRA-backed resources on pod admission	2026-01-08 04:03:42 +05:30
Kubernetes Prow Robot	4e69edd0ee	Merge pull request #135392 from brejman/issue-134393-nominated-nodes Fix queue hint for plugins on change to pods with nominated nodes	2026-01-07 20:05:38 +05:30
Ed Bartosh	c2361491f5	Fix extended resource handling for DRA-backed resources In kubelet admission: - Remove extended resources from pod requirements if they are either backed by DRA or not present in node's allocatable resources In scheduler (fit.go): - Remove fallback logic that delegated all resources to DRA when draManager is nil These changes ensure that: - DRA-backed extended resources are properly handled during pod admission - DevicePlugin-backed extended resources still follow standard admission rules	2026-01-02 16:08:49 +02:00
Patrick Ohly	7a4d650125	DRA extended resources: fix flake in unit tests The tests assumed that instantiating a DRAManager followed by informerFactory.WaitForCacheSync would be enough to have the manager up-to-date, but that's not correct: the test only waits for informer caches to be synced, but syncing event handlers like the one in the manager may still be going on. The flake rate is low, though: $ GOPATH/bin/stress -p 256 ./noderesources.test 5s: 0 runs so far, 0 failures, 256 active 10s: 256 runs so far, 0 failures, 256 active 15s: 256 runs so far, 0 failures, 256 active 20s: 512 runs so far, 0 failures, 256 active 25s: 567 runs so far, 0 failures, 256 active 30s: 771 runs so far, 0 failures, 256 active /tmp/go-stress-20251226T181044-974980161 --- FAIL: TestCalculateResourceAllocatableRequest (0.81s) --- FAIL: TestCalculateResourceAllocatableRequest/DRA-backed-resource-with-shared-device-allocation (0.00s) extendedresourcecache.go:197: I1226 18:11:14.431337] Updated extended resource cache for explicit mapping extendedResource="extended.resource.dra.io/something" deviceClass="device-class-name" extendedresourcecache.go:204: I1226 18:11:14.431380] Updated extended resource cache for default mapping extendedResource="deviceclass.resource.kubernetes.io/device-class-name" deviceClass="device-class-name" extendedresourcecache.go:220: I1226 18:11:14.431394] Updated device class mapping deviceClass="device-class-name" extendedResource="extended.resource.dra.io/something" resource_allocation_test.go:595: Expected requested=2, but got requested=1 FAIL It becomes higher when changing WaitForCacheSync such that it doesn't poll and therefore returns more promptly, which is where this flake was first observed. The fix is to run the test in a syntest bubble where Wait can be used to wait for all background activity, including event handling, to be finished before proceeding with the test. synctest is less forgiving about lingering goroutines. A synctest bubble must wait for gouroutines to stop, which in this case means that there has to be a way to wait for the metric recorder shutdown. Event handlers have to be removed. This could be done with plain Go, but here test/utils/ktesting is used instead because it offers some advantages: - less boilerplate code - automatic cancellation of the context (i.e. less manual context.WithCancel) - tCtx.SyncTest is a direct substitute for t.Run, which avoids re-indenting sub-tests. synctest itself needs another anonymous function, which makes the line too long and forced re-indention: t.Run(... func(...) { synctest.Test(... func() { }) }) For the sake of consistency all tests get updated. While at it, some code gets improved: - t.Fatal(err) is not a good way to report an error because there is no additional markup in the test output that indicates that there was an unexpected error. It just logs err.Error(), which might not be very informative and/or obvious. - newTestDRAManager aborts in case of a failure instead of returning an error.	2025-12-27 09:47:56 +01:00
Bartosz	3b4f0be6e3	Check NominatedNodeName to decide if a pod is scheduled	2025-12-19 12:30:06 +00:00
Ed Bartosh	1820dc7535	Fit tests: add DRA-aware test cases	2025-12-12 15:48:18 +02:00
Ed Bartosh	7860effc2c	resourceAllocationScorer: add unit test for DRA nodeMatches	2025-12-12 15:48:13 +02:00
Ed Bartosh	02a39d6c1e	Balanced allocation tests: cover DRA resources - Added DRA-aware test cases - Pulled shared DRA setup out into helper to keep tests DRY - Added SignPod test	2025-12-12 13:51:19 +02:00
bwsalmon	854e67bb51	KEP 5598: Opportunistic Batching (#135231 ) * First version of batching w/out signatures. * First version of pod signatures. * Integrate batching with signatures. * Fix merge conflicts. * Fixes from self-review. * Test fixes. * Fix a bug that limited batches to size 2 Also add some new high-level logging and simplify the pod affinity signature. * Re-enable batching on perf tests for now. * fwk.NewStatus(fwk.Success) * Review feedback. * Review feedback. * Comment fix. * Two plugin specific unit tests.: * Add cycle state to the sign call, apply to topo spread. Also add unit tests for several plugi signature calls. * Review feedback. * Switch to distinct stats for hint and store calls. * Switch signature from string to []byte * Revert cyclestate in signs. Update node affinity. Node affinity now sorts all of the various nested arrays in the structure. CycleState no longer in signature; revert to signing fewer cases for pod spread. * hack/update-vendor.sh * Disable signatures when extenders are configured. * Update pkg/scheduler/framework/runtime/batch.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Update staging/src/k8s.io/kube-scheduler/framework/interface.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Review feedback. * Disable node resource signatures when extended DRA enabled. * Review feedback. * Update pkg/scheduler/framework/plugins/imagelocality/image_locality.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Update pkg/scheduler/framework/interface.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Update pkg/scheduler/framework/plugins/nodedeclaredfeatures/nodedeclaredfeatures.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Update pkg/scheduler/framework/runtime/batch.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Review feedback. * Fixes for review suggestions. * Add integration tests. * Linter fixes, test fix. * Whitespace fix. * Remove broken test. * Unschedulable test. * Remove go.mod changes. --------- Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>	2025-11-12 21:51:37 -08:00
ndixita	7645eb70e9	Scheduler changes to support pod level resources in place resize	2025-11-11 18:15:22 +00:00
Ed Bartosh	fc404b6a3d	Cache DRA state for scoring extended resources Extend Fit and BalancedAllocation PreScore state with the the allocated state, the list of ResourceSlices and the device class mapping. Gather these once during PreScore and pass them through the scoring path instead of re-fetching for every scoring call. This should speed up scoring of DRA extended resources, lowering scheduling overhead. Co-authored-by: Patrick Ohly <patrick.ohly@intel.com> Co-authored-by: Maciej Skoczeń <mskoczen@google.com> Co-authored-by: Dominik Marciński <gmidon@gmail.com>	2025-11-06 18:09:11 +02:00
Ed Bartosh	edbc32fa60	DRA: implement scoring for extended resources Updated extended resource allocation scorer to calculate allocatable and requested values for DRA-backed resources.	2025-11-06 10:40:52 +02:00
Kubernetes Prow Robot	cf37f0bf49	Merge pull request #135037 from yliaog/extendedresourcecache pick one device class deterministically for extended resource	2025-11-05 14:16:58 -08:00
yliao	c67937dd35	switched from storing name to storing a pointer to the device class.	2025-11-04 17:51:12 +00:00
yliao	2e479e00f4	refactored the hint function, added test cases	2025-11-04 16:31:57 +00:00
yliao	14f17a3809	addressed review feedback	2025-11-03 22:53:27 +00:00
yliao	b609d4713c	added integration test case	2025-11-03 21:27:41 +00:00
yliao	7aa849160a	added queue hint function	2025-11-03 21:27:41 +00:00
yliao	3b905ae4b5	added device class add/update events to noderesources plugin when DRAExtendedResource feature is enabled	2025-11-03 21:27:41 +00:00
yliao	3eab698884	fixed unit test and integration test failures Fix minor nits Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>	2025-11-03 20:07:01 +05:30
Sai Ramesh Vanka	d8c66ffb63	Add a global cache to support DRA's extended resource to the device class mapping - Add a new interface "DeviceClassResolver" in the scheduler framework - Add a global cache of mapping between the extended resource and the device class - Cache can be leveraged by the k8s api-server, controller-manager along with the scheduler - This change helps in delegating the requests to the dynamicresource plugin based on the mapping during the node update events and thus avoiding an extra scheduling cycle Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>	2025-11-03 12:31:16 +05:30
Ed Bartosh	1cb45e2a27	DRA: fix scheduling of pods with extended resources Previously, the scheduler assumed an extended resource was maintained by a device plugin if its name was present in the node's Allocatable map, even if its value was zero. This blocked scheduling when a device plugin was disconnected or uninstalled, because Kubelet still reported the resource with Allocatable=0. This change adds a check for the actual allocatable value in addition to a key presence check, allowing nodes with uninstalled device plugins to be considered for scheduling.	2025-10-27 16:24:29 +02:00
yliao	79f8d1b1c5	fixed bug such that implicit extended resource name can always be used, no matter the explicit extendedResourceName field in device class is set or not.	2025-09-10 14:10:40 +00:00
Ania Borowiec	fadb40199f	Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler	2025-09-02 09:42:53 +00:00
yliao	34a64db2c7	extended resource backed by DRA: implementation	2025-07-29 18:55:21 +00:00
Ania Borowiec	aecd37e6fb	Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler	2025-07-24 12:10:58 +00:00
Ania Borowiec	ee8c265d35	Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework	2025-06-30 10:06:22 +00:00
Ania Borowiec	00d3750503	Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes (#132190 ) * Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes apply review comment and fix linter warning * update-vendor.sh * update doc comments * run update-vendor.sh	2025-06-26 08:06:29 -07:00
Davanum Srinivas	03afe6471b	Add a replacement for cmp.Diff using json+go-difflib Co-authored-by: Jordan Liggitt <jordan@liggitt.net> Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2025-06-16 17:10:42 -04:00
Kubernetes Prow Robot	e0859f91b7	Merge pull request #131887 from ania-borowiec/extract_cyclestate_interface Moving Scheduler interfaces to staging: split CycleState into interface and implementation, move interface to staging repo	2025-05-30 04:00:18 -07:00
Ania Borowiec	d75af825fb	Extract interface CycleState and move is to staging repo. CycleState implementation remains in k/k/pkg/scheduler/framework	2025-05-29 16:18:36 +00:00
Kubernetes Prow Robot	2a3ca42c91	Merge pull request #131345 from haosdent/haosdent/return-unresolvable-when-exceed-node-resources scheduler: return UnschedulableAndUnresolvable when node capacity is insufficient	2025-05-14 05:13:25 -07:00
Kubernetes Prow Robot	8a6b916765	Merge pull request #130720 from saintube/scheduler-expose-nodeinfo-in-prefilter Expose NodeInfo to PreFilter plugins	2025-04-23 13:31:29 -07:00
Haosdent Huang	f63702de0f	scheduler: return UnschedulableAndUnresolvable when node capacity is insufficient Currently, the NodeResourcesFit plugin always returns Unschedulable when a pod's resource requests exceed a node's available resources. However, when a pod's requests exceed the node's total allocatable, preemption cannot help since even an empty node would not have enough resources. This change modifies the NodeResourcesFit plugin to return UnschedulableAndUnresolvable when a pod's resource requests exceed the node's total allocatable. This helps optimize the scheduling process in large clusters by: 1. Reducing the number of candidate nodes that need to be considered for preemption 2. Providing clearer feedback about unresolvable resource constraints 3. Improving scheduling performance by avoiding unnecessary preemption calculations The change is particularly beneficial in heterogeneous clusters where node sizes vary significantly, as it helps quickly identify nodes that are fundamentally too small for certain pods. Fixes https://github.com/kubernetes/kubernetes/issues/131310 Co-authored-by: Kensei Nakada <handbomusic@gmail.com>	2025-04-22 14:54:40 +08:00
saintube	8dc6806d26	Expose NodeInfo to PreFilter plugins and Framework Co-authored-by: Zhan Sheng <49895476+AxeZhan@users.noreply.github.com> Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com> Signed-off-by: saintube <saintube@foxmail.com>	2025-03-21 14:55:25 +08:00
Kubernetes Prow Robot	838f3c0852	Merge pull request #130577 from KevinTMtz/pod-level-hugepages [PodLevelResources] Pod Level Hugepage Resources	2025-03-20 15:34:38 -07:00
Kevin Torres	b9e0d4ad66	Unit tests for pod level hugepage resources	2025-03-20 17:54:39 +00:00
dom4ha	4deb4f2b5f	Trigger rescheduling on delete event also when unscheduled pod is removed	2025-03-10 15:03:50 +00:00
Hongqi Yu	d76f40d2f3	fix(scheduler): skip best-effort pods in BalancedAllocation PreScore - Refactored `PreScore` method in `balanced_allocation.go` to skip best-effort pods. - Updated unit tests in `balanced_allocation_test.go` to check for the new status codes.	2025-03-07 13:13:02 +08:00
saintube	afb4e96510	Expose NodeInfo to Score plugins Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com> Signed-off-by: saintube <saintube@foxmail.com>	2025-03-04 17:57:14 +08:00
Kubernetes Prow Robot	facb1a8c55	Merge pull request #129905 from ania-borowiec/129778_replace_equal Replace reflect.DeepEqual with cmp.Diff in pkg/scheduler tests	2025-02-26 08:24:30 -08:00
googs1025	239aad8e4b	chore(scheduler): use framework.Features in scheduler plugins	2025-02-26 19:16:07 +08:00
Ania Borowiec	4205f04ce3	Replace uses of reflect.DeepEqual with cmp.Diff in pkg/scheduler tests	2025-02-26 09:27:51 +00:00
Davanum Srinivas	4e05bc20db	Linter to ensure go-cmp/cmp is used ONLY in tests Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2025-01-24 20:49:14 -05:00
ndixita	6db40446de	Scheduler changes: 1. Use pod-level resource when feature is enabled and resources are set at pod-level 2. Edge case handling: When a pod defines only CPU or memory limits at pod-level (but not both), and container-level requests/limits are unset, the pod-level requests stay empty for the resource without a pod-limit. The container's request for that resource is then set to the default request value from schedutil.	2024-11-08 03:00:54 +00:00
Tim Allclair	81df195819	Stop using status.AllocatedResources to aggregate resources	2024-11-01 14:02:58 -07:00
Kubernetes Prow Robot	aec2ea1877	Merge pull request #124609 from AxeZhan/refac Move some helper functions from api/v1 to component-helpers	2024-10-25 17:26:52 +01:00
AxeZhan	2ffb568540	rename functions	2024-10-25 12:53:24 +08:00
Kensei Nakada	83f9e4b6df	cleanup: remove event list	2024-10-18 11:10:10 +10:00
AxeZhan	b1f07bb36c	add tests for scheduler	2024-10-10 15:53:19 +08:00

1 2 3 4 5

222 commits