Commit graph

222 commits

Author SHA1 Message Date
Kubernetes Prow Robot
8ab1bc1633
Merge pull request #135725 from bart0sh/PR211-add-extended-resources-test-cases
Fix extended resource handling for DRA-backed resources on pod admission
2026-01-08 04:03:42 +05:30
Kubernetes Prow Robot
4e69edd0ee
Merge pull request #135392 from brejman/issue-134393-nominated-nodes
Fix queue hint for plugins on change to pods with nominated nodes
2026-01-07 20:05:38 +05:30
Ed Bartosh
c2361491f5 Fix extended resource handling for DRA-backed resources
In kubelet admission:
   - Remove extended resources from pod requirements if they are either
     backed by DRA or not present in node's allocatable resources

In scheduler (fit.go):
   - Remove fallback logic that delegated all resources to DRA when
     draManager is nil

These changes ensure that:
- DRA-backed extended resources are properly handled during pod admission
- DevicePlugin-backed extended resources still follow standard admission rules
2026-01-02 16:08:49 +02:00
Patrick Ohly
7a4d650125 DRA extended resources: fix flake in unit tests
The tests assumed that instantiating a DRAManager followed by
informerFactory.WaitForCacheSync would be enough to have the manager
up-to-date, but that's not correct: the test only waits for informer *caches*
to be synced, but syncing *event handlers* like the one in the manager may
still be going on. The flake rate is low, though:

    $ GOPATH/bin/stress -p 256 ./noderesources.test
    5s: 0 runs so far, 0 failures, 256 active
    10s: 256 runs so far, 0 failures, 256 active
    15s: 256 runs so far, 0 failures, 256 active
    20s: 512 runs so far, 0 failures, 256 active
    25s: 567 runs so far, 0 failures, 256 active
    30s: 771 runs so far, 0 failures, 256 active

    /tmp/go-stress-20251226T181044-974980161
    --- FAIL: TestCalculateResourceAllocatableRequest (0.81s)
        --- FAIL: TestCalculateResourceAllocatableRequest/DRA-backed-resource-with-shared-device-allocation (0.00s)
            extendedresourcecache.go:197: I1226 18:11:14.431337] Updated extended resource cache for explicit mapping extendedResource="extended.resource.dra.io/something" deviceClass="device-class-name"
            extendedresourcecache.go:204: I1226 18:11:14.431380] Updated extended resource cache for default mapping extendedResource="deviceclass.resource.kubernetes.io/device-class-name" deviceClass="device-class-name"
            extendedresourcecache.go:220: I1226 18:11:14.431394] Updated device class mapping deviceClass="device-class-name" extendedResource="extended.resource.dra.io/something"
            resource_allocation_test.go:595: Expected requested=2, but got requested=1
    FAIL

It becomes higher when changing WaitForCacheSync such that it doesn't poll and
therefore returns more promptly, which is where this flake was first observed.

The fix is to run the test in a syntest bubble where Wait can be used to wait
for all background activity, including event handling, to be finished before
proceeding with the test.

synctest is less forgiving about lingering goroutines. A synctest bubble must
wait for gouroutines to stop, which in this case means that there has to be
a way to wait for the metric recorder shutdown. Event handlers have to be
removed.

This could be done with plain Go, but here test/utils/ktesting is used instead
because it offers some advantages:
- less boilerplate code
- automatic cancellation of the context (i.e. less manual context.WithCancel)
- tCtx.SyncTest is a direct substitute for t.Run, which avoids re-indenting
  sub-tests. synctest itself needs another anonymous function, which makes
  the line too long and forced re-indention:
     t.Run(... func(...) {
         synctest.Test(... func() {
         })
     })

For the sake of consistency all tests get updated.

While at it, some code gets improved:

- t.Fatal(err) is not a good way to report an error because
  there is no additional markup in the test output that indicates
  that there was an unexpected error. It just logs err.Error(),
  which might not be very informative and/or obvious.
- newTestDRAManager aborts in case of a failure instead of
  returning an error.
2025-12-27 09:47:56 +01:00
Bartosz
3b4f0be6e3
Check NominatedNodeName to decide if a pod is scheduled 2025-12-19 12:30:06 +00:00
Ed Bartosh
1820dc7535 Fit tests: add DRA-aware test cases 2025-12-12 15:48:18 +02:00
Ed Bartosh
7860effc2c resourceAllocationScorer: add unit test for DRA nodeMatches 2025-12-12 15:48:13 +02:00
Ed Bartosh
02a39d6c1e Balanced allocation tests: cover DRA resources
- Added DRA-aware test cases
- Pulled shared DRA setup out into helper to keep tests DRY
- Added SignPod test
2025-12-12 13:51:19 +02:00
bwsalmon
854e67bb51
KEP 5598: Opportunistic Batching (#135231)
* First version of batching w/out signatures.

* First version of pod signatures.

* Integrate batching with signatures.

* Fix merge conflicts.

* Fixes from self-review.

* Test fixes.

* Fix a bug that limited batches to size 2
Also add some new high-level logging and
simplify the pod affinity signature.

* Re-enable batching on perf tests for now.

* fwk.NewStatus(fwk.Success)

* Review feedback.

* Review feedback.

* Comment fix.

* Two plugin specific unit tests.:

* Add cycle state to the sign call, apply to topo spread.
Also add unit tests for several plugi signature
calls.

* Review feedback.

* Switch to distinct stats for hint and store calls.

* Switch signature from string to []byte

* Revert cyclestate in signs. Update node affinity.
Node affinity now sorts all of the various
nested arrays in the structure. CycleState no
longer in signature; revert to signing fewer
cases for pod spread.

* hack/update-vendor.sh

* Disable signatures when extenders are configured.

* Update pkg/scheduler/framework/runtime/batch.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update staging/src/k8s.io/kube-scheduler/framework/interface.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Review feedback.

* Disable node resource signatures when extended DRA enabled.

* Review feedback.

* Update pkg/scheduler/framework/plugins/imagelocality/image_locality.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/interface.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/plugins/nodedeclaredfeatures/nodedeclaredfeatures.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/runtime/batch.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Review feedback.

* Fixes for review suggestions.

* Add integration tests.

* Linter fixes, test fix.

* Whitespace fix.

* Remove broken test.

* Unschedulable test.

* Remove go.mod changes.

---------

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
2025-11-12 21:51:37 -08:00
ndixita
7645eb70e9
Scheduler changes to support pod level resources in place resize 2025-11-11 18:15:22 +00:00
Ed Bartosh
fc404b6a3d Cache DRA state for scoring extended resources
Extend Fit and BalancedAllocation PreScore state with the the
allocated state, the list of ResourceSlices and the device class
mapping. Gather these once during PreScore and pass them through
the scoring path instead of re-fetching for every scoring call.

This should speed up scoring of DRA extended resources, lowering
scheduling overhead.

Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Co-authored-by: Maciej Skoczeń <mskoczen@google.com>
Co-authored-by: Dominik Marciński <gmidon@gmail.com>
2025-11-06 18:09:11 +02:00
Ed Bartosh
edbc32fa60 DRA: implement scoring for extended resources
Updated extended resource allocation scorer to calculate
allocatable and requested values for DRA-backed resources.
2025-11-06 10:40:52 +02:00
Kubernetes Prow Robot
cf37f0bf49
Merge pull request #135037 from yliaog/extendedresourcecache
pick one device class deterministically for extended resource
2025-11-05 14:16:58 -08:00
yliao
c67937dd35 switched from storing name to storing a pointer to the device class. 2025-11-04 17:51:12 +00:00
yliao
2e479e00f4 refactored the hint function, added test cases 2025-11-04 16:31:57 +00:00
yliao
14f17a3809 addressed review feedback 2025-11-03 22:53:27 +00:00
yliao
b609d4713c added integration test case 2025-11-03 21:27:41 +00:00
yliao
7aa849160a added queue hint function 2025-11-03 21:27:41 +00:00
yliao
3b905ae4b5 added device class add/update events to noderesources plugin when DRAExtendedResource feature is enabled 2025-11-03 21:27:41 +00:00
yliao
3eab698884 fixed unit test and integration test failures
Fix minor nits

Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
2025-11-03 20:07:01 +05:30
Sai Ramesh Vanka
d8c66ffb63 Add a global cache to support DRA's extended resource to the device
class mapping

- Add a new interface "DeviceClassResolver" in the scheduler framework
- Add a global cache of mapping between the extended resource and the
  device class
- Cache can be leveraged by the k8s api-server, controller-manager along with the scheduler
- This change helps in delegating the requests to the dynamicresource
  plugin based on the mapping during the node update events and thus
avoiding an extra scheduling cycle

Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
2025-11-03 12:31:16 +05:30
Ed Bartosh
1cb45e2a27 DRA: fix scheduling of pods with extended resources
Previously, the scheduler assumed an extended resource was maintained
by a device plugin if its name was present in the node's Allocatable
map, even if its value was zero. This blocked scheduling when a device
plugin was disconnected or uninstalled, because Kubelet still reported
the resource with Allocatable=0.

This change adds a check for the actual allocatable value in addition
to a key presence check, allowing nodes with uninstalled device
plugins to be considered for scheduling.
2025-10-27 16:24:29 +02:00
yliao
79f8d1b1c5 fixed bug such that implicit extended resource name can always be used,
no matter the explicit extendedResourceName field in device class is set or not.
2025-09-10 14:10:40 +00:00
Ania Borowiec
fadb40199f
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler 2025-09-02 09:42:53 +00:00
yliao
34a64db2c7 extended resource backed by DRA: implementation 2025-07-29 18:55:21 +00:00
Ania Borowiec
aecd37e6fb
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler 2025-07-24 12:10:58 +00:00
Ania Borowiec
ee8c265d35
Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework 2025-06-30 10:06:22 +00:00
Ania Borowiec
00d3750503
Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes (#132190)
* Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes

apply review comment and fix linter warning

* update-vendor.sh

* update doc comments

* run update-vendor.sh
2025-06-26 08:06:29 -07:00
Davanum Srinivas
03afe6471b
Add a replacement for cmp.Diff using json+go-difflib
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2025-06-16 17:10:42 -04:00
Kubernetes Prow Robot
e0859f91b7
Merge pull request #131887 from ania-borowiec/extract_cyclestate_interface
Moving Scheduler interfaces to staging: split CycleState into interface and implementation, move interface to staging repo
2025-05-30 04:00:18 -07:00
Ania Borowiec
d75af825fb
Extract interface CycleState and move is to staging repo. CycleState implementation remains in k/k/pkg/scheduler/framework 2025-05-29 16:18:36 +00:00
Kubernetes Prow Robot
2a3ca42c91
Merge pull request #131345 from haosdent/haosdent/return-unresolvable-when-exceed-node-resources
scheduler: return UnschedulableAndUnresolvable when node capacity is insufficient
2025-05-14 05:13:25 -07:00
Kubernetes Prow Robot
8a6b916765
Merge pull request #130720 from saintube/scheduler-expose-nodeinfo-in-prefilter
Expose NodeInfo to PreFilter plugins
2025-04-23 13:31:29 -07:00
Haosdent Huang
f63702de0f scheduler: return UnschedulableAndUnresolvable when node capacity is insufficient
Currently, the NodeResourcesFit plugin always returns Unschedulable when a pod's
resource requests exceed a node's available resources. However, when a pod's
requests exceed the node's total allocatable, preemption cannot help since even
an empty node would not have enough resources.

This change modifies the NodeResourcesFit plugin to return UnschedulableAndUnresolvable
when a pod's resource requests exceed the node's total allocatable. This helps
optimize the scheduling process in large clusters by:
1. Reducing the number of candidate nodes that need to be considered for preemption
2. Providing clearer feedback about unresolvable resource constraints
3. Improving scheduling performance by avoiding unnecessary preemption calculations

The change is particularly beneficial in heterogeneous clusters where node sizes
vary significantly, as it helps quickly identify nodes that are fundamentally
too small for certain pods.

Fixes https://github.com/kubernetes/kubernetes/issues/131310

Co-authored-by: Kensei Nakada <handbomusic@gmail.com>
2025-04-22 14:54:40 +08:00
saintube
8dc6806d26 Expose NodeInfo to PreFilter plugins and Framework
Co-authored-by: Zhan Sheng <49895476+AxeZhan@users.noreply.github.com>
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>
Signed-off-by: saintube <saintube@foxmail.com>
2025-03-21 14:55:25 +08:00
Kubernetes Prow Robot
838f3c0852
Merge pull request #130577 from KevinTMtz/pod-level-hugepages
[PodLevelResources] Pod Level Hugepage Resources
2025-03-20 15:34:38 -07:00
Kevin Torres
b9e0d4ad66 Unit tests for pod level hugepage resources 2025-03-20 17:54:39 +00:00
dom4ha
4deb4f2b5f Trigger rescheduling on delete event also when unscheduled pod is removed 2025-03-10 15:03:50 +00:00
Hongqi Yu
d76f40d2f3 fix(scheduler): skip best-effort pods in BalancedAllocation PreScore
- Refactored `PreScore` method in `balanced_allocation.go` to skip
  best-effort pods.
- Updated unit tests in `balanced_allocation_test.go` to check for
  the new status codes.
2025-03-07 13:13:02 +08:00
saintube
afb4e96510 Expose NodeInfo to Score plugins
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>
Signed-off-by: saintube <saintube@foxmail.com>
2025-03-04 17:57:14 +08:00
Kubernetes Prow Robot
facb1a8c55
Merge pull request #129905 from ania-borowiec/129778_replace_equal
Replace reflect.DeepEqual with cmp.Diff in pkg/scheduler tests
2025-02-26 08:24:30 -08:00
googs1025
239aad8e4b chore(scheduler): use framework.Features in scheduler plugins 2025-02-26 19:16:07 +08:00
Ania Borowiec
4205f04ce3
Replace uses of reflect.DeepEqual with cmp.Diff in pkg/scheduler tests 2025-02-26 09:27:51 +00:00
Davanum Srinivas
4e05bc20db
Linter to ensure go-cmp/cmp is used ONLY in tests
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2025-01-24 20:49:14 -05:00
ndixita
6db40446de Scheduler changes:
1. Use pod-level resource when feature is enabled and resources are set at pod-level
2. Edge case handling: When a pod defines only CPU or memory limits at pod-level (but not both), and container-level requests/limits are unset, the pod-level requests stay empty for the resource without a pod-limit. The container's request for that resource is then set to the default request value from schedutil.
2024-11-08 03:00:54 +00:00
Tim Allclair
81df195819 Stop using status.AllocatedResources to aggregate resources 2024-11-01 14:02:58 -07:00
Kubernetes Prow Robot
aec2ea1877
Merge pull request #124609 from AxeZhan/refac
Move some helper functions from api/v1 to component-helpers
2024-10-25 17:26:52 +01:00
AxeZhan
2ffb568540 rename functions 2024-10-25 12:53:24 +08:00
Kensei Nakada
83f9e4b6df cleanup: remove event list 2024-10-18 11:10:10 +10:00
AxeZhan
b1f07bb36c add tests for scheduler 2024-10-10 15:53:19 +08:00