Commit graph

1450 commits

Author SHA1 Message Date
Antoni Zawodny
833b7205fc Run PreBind plugins in parallel if feasible 2026-01-11 14:19:18 +01:00
Kubernetes Prow Robot
8ab1bc1633
Merge pull request #135725 from bart0sh/PR211-add-extended-resources-test-cases
Fix extended resource handling for DRA-backed resources on pod admission
2026-01-08 04:03:42 +05:30
Kubernetes Prow Robot
4e69edd0ee
Merge pull request #135392 from brejman/issue-134393-nominated-nodes
Fix queue hint for plugins on change to pods with nominated nodes
2026-01-07 20:05:38 +05:30
Kubernetes Prow Robot
b2ac9e206f
Merge pull request #130231 from Barakmor1/updateimagelocality
Update ImageLocality plugin to account ImageVolume images
2026-01-05 12:28:37 +05:30
Ed Bartosh
c2361491f5 Fix extended resource handling for DRA-backed resources
In kubelet admission:
   - Remove extended resources from pod requirements if they are either
     backed by DRA or not present in node's allocatable resources

In scheduler (fit.go):
   - Remove fallback logic that delegated all resources to DRA when
     draManager is nil

These changes ensure that:
- DRA-backed extended resources are properly handled during pod admission
- DevicePlugin-backed extended resources still follow standard admission rules
2026-01-02 16:08:49 +02:00
Patrick Ohly
dfa6aa22b2 DRA scheduler: fix unit test flakes
Test_isSchedulableAfterClaimChange was sensitive to system load because of the
arbitrary delay when waiting for the assume cache to catch up. Running inside
a synctest bubble avoids this. While at it, the unit tests get converted
to ktesting (nicer failure output, no extra indention needed for
tCtx.SyncTest).

TestPlugin/prebind-fail-with-binding-timeout relied on setting up a claim with
certain time stamps and then getting that test case tested within a certain
real-world time window. It's surprising that this didn't flake more often
because test execution order is random. Now the time stamp gets set right
before the test case is about to be tested. Conversion to a synctest would
be nicer, but synctests cannot have sub-tests, which are used here to track
where log output and failures come from within the larger test case.

Inside the plugin itself some log output gets added to explain why a claim is
unavailable on a node in case of a binding timeout or error during Filter.
2025-12-30 11:45:02 +01:00
Kubernetes Prow Robot
3226fe520d
Merge pull request #135948 from pohly/dra-scheduler-resource-plugin-unit-test-fix
DRA extended resources: fix flake in unit tests
2025-12-30 16:12:35 +05:30
Kubernetes Prow Robot
2a3a6605ac
Merge pull request #135330 from sujalshah-bit/fix-mem-leak
scheduler: Fix memory leak in scheduler cache
2025-12-29 15:56:34 +05:30
Patrick Ohly
7a4d650125 DRA extended resources: fix flake in unit tests
The tests assumed that instantiating a DRAManager followed by
informerFactory.WaitForCacheSync would be enough to have the manager
up-to-date, but that's not correct: the test only waits for informer *caches*
to be synced, but syncing *event handlers* like the one in the manager may
still be going on. The flake rate is low, though:

    $ GOPATH/bin/stress -p 256 ./noderesources.test
    5s: 0 runs so far, 0 failures, 256 active
    10s: 256 runs so far, 0 failures, 256 active
    15s: 256 runs so far, 0 failures, 256 active
    20s: 512 runs so far, 0 failures, 256 active
    25s: 567 runs so far, 0 failures, 256 active
    30s: 771 runs so far, 0 failures, 256 active

    /tmp/go-stress-20251226T181044-974980161
    --- FAIL: TestCalculateResourceAllocatableRequest (0.81s)
        --- FAIL: TestCalculateResourceAllocatableRequest/DRA-backed-resource-with-shared-device-allocation (0.00s)
            extendedresourcecache.go:197: I1226 18:11:14.431337] Updated extended resource cache for explicit mapping extendedResource="extended.resource.dra.io/something" deviceClass="device-class-name"
            extendedresourcecache.go:204: I1226 18:11:14.431380] Updated extended resource cache for default mapping extendedResource="deviceclass.resource.kubernetes.io/device-class-name" deviceClass="device-class-name"
            extendedresourcecache.go:220: I1226 18:11:14.431394] Updated device class mapping deviceClass="device-class-name" extendedResource="extended.resource.dra.io/something"
            resource_allocation_test.go:595: Expected requested=2, but got requested=1
    FAIL

It becomes higher when changing WaitForCacheSync such that it doesn't poll and
therefore returns more promptly, which is where this flake was first observed.

The fix is to run the test in a syntest bubble where Wait can be used to wait
for all background activity, including event handling, to be finished before
proceeding with the test.

synctest is less forgiving about lingering goroutines. A synctest bubble must
wait for gouroutines to stop, which in this case means that there has to be
a way to wait for the metric recorder shutdown. Event handlers have to be
removed.

This could be done with plain Go, but here test/utils/ktesting is used instead
because it offers some advantages:
- less boilerplate code
- automatic cancellation of the context (i.e. less manual context.WithCancel)
- tCtx.SyncTest is a direct substitute for t.Run, which avoids re-indenting
  sub-tests. synctest itself needs another anonymous function, which makes
  the line too long and forced re-indention:
     t.Run(... func(...) {
         synctest.Test(... func() {
         })
     })

For the sake of consistency all tests get updated.

While at it, some code gets improved:

- t.Fatal(err) is not a good way to report an error because
  there is no additional markup in the test output that indicates
  that there was an unexpected error. It just logs err.Error(),
  which might not be very informative and/or obvious.
- newTestDRAManager aborts in case of a failure instead of
  returning an error.
2025-12-27 09:47:56 +01:00
Bartosz
3b4f0be6e3
Check NominatedNodeName to decide if a pod is scheduled 2025-12-19 12:30:06 +00:00
Kubernetes Prow Robot
a504b1b4eb
Merge pull request #135755 from pohly/dra-logging
DRA: log more information
2025-12-18 02:10:38 -08:00
bmordeha
6f57f1e95b Update imageLocality plugin
to account for ImageVolume images when scoring
and prioritizing nodes with required pod images

Signed-off-by: bmordeha <bmordeha@redhat.com>
2025-12-18 09:28:39 +02:00
Kubernetes Prow Robot
c5a0c31294
Merge pull request #135484 from bart0sh/PR209-improve-balanced-allocation-coverage
Extended resources unit tests: cover DRA resources
2025-12-17 22:36:06 -08:00
Kubernetes Prow Robot
1a3d8712f3
Merge pull request #135394 from brejman/adhoc-interpodaffinity-pending-pod-update
Fix queue hint for interpodaffinity when target pod is updated
2025-12-17 21:42:46 -08:00
Bartosz
d6d8639349
Fix queue hint for interpod antiaffinity 2025-12-16 13:01:15 +00:00
Bartosz
145adcd522
Fix queue hint for interpodaffinity when target pod is updated 2025-12-16 12:57:50 +00:00
Patrick Ohly
5d536bfb8e DRA: log more information
For debugging double allocation of the same
device (https://github.com/kubernetes/kubernetes/issues/133602) it is necessary
to have information about pools, devices and in-flight claims. Log calls get
extended and the config for DRA CI jobs updated to enable higher verbosity for
relevant source files.

Log output in such a cluster at verbosity 6 looks like this:

I1215 10:28:54.166872       1 allocator_incubating.go:130] "Gathered pool information" logger="FilterWithNominatedPods.Filter.DynamicResources" pod="dra-8841/tester-3" node="kind-worker2" pools={"count":1,"devices":["dra-8841.k8s.io/kind-worker2/device-00"],"meta":[{"InvalidReason":"","id":"dra-8841.k8s.io/kind-worker2","isIncomplete":false,"isInvalid":false}]}
I1215 10:28:54.166941       1 allocator_incubating.go:254] "Gathered information about devices" logger="FilterWithNominatedPods.Filter.DynamicResources" pod="dra-8841/tester-3" node="kind-worker2" allocatedDevices={"count":2,"devices":["dra-8841.k8s.io/kind-worker/device-00","dra-8841.k8s.io/kind-worker3/device-00"]} minDevicesToBeAllocated=1
2025-12-16 09:58:05 +01:00
Ed Bartosh
1820dc7535 Fit tests: add DRA-aware test cases 2025-12-12 15:48:18 +02:00
Ed Bartosh
7860effc2c resourceAllocationScorer: add unit test for DRA nodeMatches 2025-12-12 15:48:13 +02:00
Ed Bartosh
02a39d6c1e Balanced allocation tests: cover DRA resources
- Added DRA-aware test cases
- Pulled shared DRA setup out into helper to keep tests DRY
- Added SignPod test
2025-12-12 13:51:19 +02:00
Ravi Sastry Kadali
9dc5683c56 scheduler: Fix memory leak in scheduler cache
The `removeSlice` function was leaving behind references to the
removed element, preventing it from being garbage-collected.
This commit ensures that removed entries are fully cleared,
eliminating the memory leak.

Co-authored-by: ravisastryk <ravisastryk@gmail.com>
Signed-off-by: Sujal Shah <sujalshah28092004@gmail.com>
2025-11-20 02:18:38 +05:30
bwsalmon
854e67bb51
KEP 5598: Opportunistic Batching (#135231)
* First version of batching w/out signatures.

* First version of pod signatures.

* Integrate batching with signatures.

* Fix merge conflicts.

* Fixes from self-review.

* Test fixes.

* Fix a bug that limited batches to size 2
Also add some new high-level logging and
simplify the pod affinity signature.

* Re-enable batching on perf tests for now.

* fwk.NewStatus(fwk.Success)

* Review feedback.

* Review feedback.

* Comment fix.

* Two plugin specific unit tests.:

* Add cycle state to the sign call, apply to topo spread.
Also add unit tests for several plugi signature
calls.

* Review feedback.

* Switch to distinct stats for hint and store calls.

* Switch signature from string to []byte

* Revert cyclestate in signs. Update node affinity.
Node affinity now sorts all of the various
nested arrays in the structure. CycleState no
longer in signature; revert to signing fewer
cases for pod spread.

* hack/update-vendor.sh

* Disable signatures when extenders are configured.

* Update pkg/scheduler/framework/runtime/batch.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update staging/src/k8s.io/kube-scheduler/framework/interface.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Review feedback.

* Disable node resource signatures when extended DRA enabled.

* Review feedback.

* Update pkg/scheduler/framework/plugins/imagelocality/image_locality.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/interface.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/plugins/nodedeclaredfeatures/nodedeclaredfeatures.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/runtime/batch.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Review feedback.

* Fixes for review suggestions.

* Add integration tests.

* Linter fixes, test fix.

* Whitespace fix.

* Remove broken test.

* Unschedulable test.

* Remove go.mod changes.

---------

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
2025-11-12 21:51:37 -08:00
ndixita
7645eb70e9
Scheduler changes to support pod level resources in place resize 2025-11-11 18:15:22 +00:00
Heba
aceb89debc
KEP-5471: Extend tolerations operators (#134665)
* Add numeric operations to tolerations

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* code review feedback

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* add default feature gate

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* Add integration tests

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* Add toleration value validation

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* Add validate options for new operators

Signed-off-by: helayoty <heelayot@microsoft.com>

* Remove log

Signed-off-by: helayoty <heelayot@microsoft.com>

* Update feature gate check

Signed-off-by: helayoty <heelayot@microsoft.com>

* emove IsValidNumericString func

Signed-off-by: helayoty <heelayot@microsoft.com>

* Implement IsDecimalInteger

Signed-off-by: helayoty <heelayot@microsoft.com>

* code review feedback

Signed-off-by: helayoty <heelayot@microsoft.com>

* Add logs to v1/toleration

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
Signed-off-by: helayoty <heelayot@microsoft.com>

* Update integration tests and address code review feedback

Signed-off-by: helayoty <heelayot@microsoft.com>

* Add feature gate to the scheduler framework

Signed-off-by: helayoty <heelayot@microsoft.com>

* Remove extra test

Signed-off-by: helayoty <heelayot@microsoft.com>

* Fix integration test

Signed-off-by: helayoty <heelayot@microsoft.com>

* pass feature gate via TolerationsTolerateTaint

Signed-off-by: helayoty <heelayot@microsoft.com>

---------

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
Signed-off-by: helayoty <heelayot@microsoft.com>
2025-11-10 12:42:54 -08:00
Kubernetes Prow Robot
0cfbf89e70
Merge pull request #134189 from mortent/NewUpdatePartitionableDevices
Updates to DRA Partitionable Devices feature
2025-11-06 16:10:53 -08:00
Kubernetes Prow Robot
6232175b94
Merge pull request #134935 from alaypatel07/refactor-dra-extended-resources
refactor dra extended resources implementation in scheduler plugin
2025-11-06 15:18:59 -08:00
Morten Torkildsen
38b5750e33 DRA: Update allocator for Partitionable Devices 2025-11-06 21:30:01 +00:00
Alay Patel
f8ccc4c4d7 dra scheduler plugin: refactor extendeddynamicresources.go for readibility
Signed-off-by: Alay Patel <alayp@nvidia.com>
2025-11-06 15:49:33 -05:00
Kubernetes Prow Robot
22962087ec
Merge pull request #135186 from pohly/dra-scheduler-unit-test-flake
DRA: fix for scheduler unit test flake + logging
2025-11-06 12:43:23 -08:00
Alay Patel
da9f1d8eed dra scheduler plugin: move extended resources functions into separate file
Signed-off-by: Alay Patel <alayp@nvidia.com>
2025-11-06 14:58:59 -05:00
Kubernetes Prow Robot
14134e03a8
Merge pull request #134058 from bart0sh/PR200-DRA-scoring-extended-resources
Implement scoring for extended resources backed up by DRA
2025-11-06 11:50:52 -08:00
Patrick Ohly
1c4cab9dda DRA scheduler unit test: fix race with ResourceSlice informer
The test started without waiting for the ResourceSlice informer to have
synced. As a result, the "CEL-runtime-error-for-one-of-three-nodes" test case
failed randomly with a very low flake rate (less than 1% in local runs) because
CEL expressions never got evaluated due to not having the slices (yet).

Other tests also were less reliable, but not known to fail.
2025-11-06 18:40:35 +01:00
Ed Bartosh
fc404b6a3d Cache DRA state for scoring extended resources
Extend Fit and BalancedAllocation PreScore state with the the
allocated state, the list of ResourceSlices and the device class
mapping. Gather these once during PreScore and pass them through
the scoring path instead of re-fetching for every scoring call.

This should speed up scoring of DRA extended resources, lowering
scheduling overhead.

Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Co-authored-by: Maciej Skoczeń <mskoczen@google.com>
Co-authored-by: Dominik Marciński <gmidon@gmail.com>
2025-11-06 18:09:11 +02:00
Maciej Skoczeń
8d67173de0 Implement Gang scheduling in kube-scheduler 2025-11-06 10:47:29 +00:00
Kubernetes Prow Robot
b869afe68d
Merge pull request #133389 from pravk03/node-capabilities
Introduce node declared features framework
2025-11-06 01:32:54 -08:00
Ed Bartosh
edbc32fa60 DRA: implement scoring for extended resources
Updated extended resource allocation scorer to calculate
allocatable and requested values for DRA-backed resources.
2025-11-06 10:40:52 +02:00
Kubernetes Prow Robot
7537d52c2e
Merge pull request #134882 from yliaog/initcon
Fix non-sidecar init container device requests
2025-11-05 21:57:04 -08:00
Kubernetes Prow Robot
f025bcace9
Merge pull request #135068 from pohly/dra-device-taints-1.35-full
DRA device taint eviction: several improvements
2025-11-05 18:52:58 -08:00
Praveen Krishna
649d9c532a feat(scheduler): Add NodeDeclaredFeatures scheduler plugin. 2025-11-06 01:21:04 +00:00
yliao
6676982316 fixed non-sidecar init container device requests and mappings 2025-11-05 22:48:50 +00:00
Kubernetes Prow Robot
cf37f0bf49
Merge pull request #135037 from yliaog/extendedresourcecache
pick one device class deterministically for extended resource
2025-11-05 14:16:58 -08:00
Kubernetes Prow Robot
738475f9e2
Merge pull request #134991 from yliaog/class_events
added device class add/update events to noderesources plugin when DRAExtendedResource feature is enabled
2025-11-05 14:16:51 -08:00
Kubernetes Prow Robot
799572b8db
Merge pull request #134711 from mortent/SimpleScoringForPrioritizedList
DRA: Add scoring for Prioritized List feature
2025-11-05 12:36:51 -08:00
Patrick Ohly
eaee6b6bce DRA device taints: add separate feature gate for rules
Support for DeviceTaintRules depends on a significant amount of
additional code:
- ResourceSlice tracker is a NOP without it.
- Additional informers and corresponding permissions in scheduler and controller.
- Controller code for handling status.

Not all users necessarily need DeviceTaintRules, so adding a second feature
gate for that code makes it possible to limit the blast radius of bugs in that
code without having to turn off device taints and tolerations entirely.
2025-11-05 20:03:17 +01:00
Kubernetes Prow Robot
3395c5358c
Merge pull request #135012 from gnufied/volume-limits-redux-cas
Do not schedule pods to a node without CSI driver
2025-11-05 09:42:58 -08:00
Morten Torkildsen
fbfeb33231 DRA: Add scoring for Prioritized List feature 2025-11-05 17:18:38 +00:00
yliao
949be1d132 fixed comments due to switch from class name to class for GetDeviceClass 2025-11-05 15:08:38 +00:00
Hemant Kumar
c77a39c06f Address review comments and fix failing tests 2025-11-05 09:44:50 -05:00
Ayato Tokubi
902c2e0c15 Fix lint errors in dynamicresources_test.go
Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
2025-11-05 10:44:50 +00:00
Ayato Tokubi
c5b1493925 Add test case for claim creation failure in DRAExtendedResources
Extend the `setup` function to support API reactors, allowing custom reactions in tests.

Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
2025-11-05 09:55:28 +00:00