Commit graph

1990 commits

Author SHA1 Message Date
Antoni Zawodny
833b7205fc Run PreBind plugins in parallel if feasible 2026-01-11 14:19:18 +01:00
Antoni Zawodny
16b375e4ef Generalize ErrorChannel to other underlying types 2026-01-11 13:58:06 +01:00
Kubernetes Prow Robot
b54554b72d
Merge pull request #135955 from utam0k/async-metrics
scheduler: align the meaning of victim metrics between async preemption and sync preemption
2026-01-08 20:39:41 +05:30
utam0k
44e0c79406
Align the meaning of victim metrics between async preemption and sync preemption
Signed-off-by: utam0k <k0ma@utam0k.jp>
2026-01-08 21:02:17 +09:00
Kubernetes Prow Robot
8ab1bc1633
Merge pull request #135725 from bart0sh/PR211-add-extended-resources-test-cases
Fix extended resource handling for DRA-backed resources on pod admission
2026-01-08 04:03:42 +05:30
Kubernetes Prow Robot
4e69edd0ee
Merge pull request #135392 from brejman/issue-134393-nominated-nodes
Fix queue hint for plugins on change to pods with nominated nodes
2026-01-07 20:05:38 +05:30
Kubernetes Prow Robot
b2ac9e206f
Merge pull request #130231 from Barakmor1/updateimagelocality
Update ImageLocality plugin to account ImageVolume images
2026-01-05 12:28:37 +05:30
Ed Bartosh
c2361491f5 Fix extended resource handling for DRA-backed resources
In kubelet admission:
   - Remove extended resources from pod requirements if they are either
     backed by DRA or not present in node's allocatable resources

In scheduler (fit.go):
   - Remove fallback logic that delegated all resources to DRA when
     draManager is nil

These changes ensure that:
- DRA-backed extended resources are properly handled during pod admission
- DevicePlugin-backed extended resources still follow standard admission rules
2026-01-02 16:08:49 +02:00
Patrick Ohly
dfa6aa22b2 DRA scheduler: fix unit test flakes
Test_isSchedulableAfterClaimChange was sensitive to system load because of the
arbitrary delay when waiting for the assume cache to catch up. Running inside
a synctest bubble avoids this. While at it, the unit tests get converted
to ktesting (nicer failure output, no extra indention needed for
tCtx.SyncTest).

TestPlugin/prebind-fail-with-binding-timeout relied on setting up a claim with
certain time stamps and then getting that test case tested within a certain
real-world time window. It's surprising that this didn't flake more often
because test execution order is random. Now the time stamp gets set right
before the test case is about to be tested. Conversion to a synctest would
be nicer, but synctests cannot have sub-tests, which are used here to track
where log output and failures come from within the larger test case.

Inside the plugin itself some log output gets added to explain why a claim is
unavailable on a node in case of a binding timeout or error during Filter.
2025-12-30 11:45:02 +01:00
Kubernetes Prow Robot
3226fe520d
Merge pull request #135948 from pohly/dra-scheduler-resource-plugin-unit-test-fix
DRA extended resources: fix flake in unit tests
2025-12-30 16:12:35 +05:30
Kubernetes Prow Robot
2a3a6605ac
Merge pull request #135330 from sujalshah-bit/fix-mem-leak
scheduler: Fix memory leak in scheduler cache
2025-12-29 15:56:34 +05:30
Patrick Ohly
7a4d650125 DRA extended resources: fix flake in unit tests
The tests assumed that instantiating a DRAManager followed by
informerFactory.WaitForCacheSync would be enough to have the manager
up-to-date, but that's not correct: the test only waits for informer *caches*
to be synced, but syncing *event handlers* like the one in the manager may
still be going on. The flake rate is low, though:

    $ GOPATH/bin/stress -p 256 ./noderesources.test
    5s: 0 runs so far, 0 failures, 256 active
    10s: 256 runs so far, 0 failures, 256 active
    15s: 256 runs so far, 0 failures, 256 active
    20s: 512 runs so far, 0 failures, 256 active
    25s: 567 runs so far, 0 failures, 256 active
    30s: 771 runs so far, 0 failures, 256 active

    /tmp/go-stress-20251226T181044-974980161
    --- FAIL: TestCalculateResourceAllocatableRequest (0.81s)
        --- FAIL: TestCalculateResourceAllocatableRequest/DRA-backed-resource-with-shared-device-allocation (0.00s)
            extendedresourcecache.go:197: I1226 18:11:14.431337] Updated extended resource cache for explicit mapping extendedResource="extended.resource.dra.io/something" deviceClass="device-class-name"
            extendedresourcecache.go:204: I1226 18:11:14.431380] Updated extended resource cache for default mapping extendedResource="deviceclass.resource.kubernetes.io/device-class-name" deviceClass="device-class-name"
            extendedresourcecache.go:220: I1226 18:11:14.431394] Updated device class mapping deviceClass="device-class-name" extendedResource="extended.resource.dra.io/something"
            resource_allocation_test.go:595: Expected requested=2, but got requested=1
    FAIL

It becomes higher when changing WaitForCacheSync such that it doesn't poll and
therefore returns more promptly, which is where this flake was first observed.

The fix is to run the test in a syntest bubble where Wait can be used to wait
for all background activity, including event handling, to be finished before
proceeding with the test.

synctest is less forgiving about lingering goroutines. A synctest bubble must
wait for gouroutines to stop, which in this case means that there has to be
a way to wait for the metric recorder shutdown. Event handlers have to be
removed.

This could be done with plain Go, but here test/utils/ktesting is used instead
because it offers some advantages:
- less boilerplate code
- automatic cancellation of the context (i.e. less manual context.WithCancel)
- tCtx.SyncTest is a direct substitute for t.Run, which avoids re-indenting
  sub-tests. synctest itself needs another anonymous function, which makes
  the line too long and forced re-indention:
     t.Run(... func(...) {
         synctest.Test(... func() {
         })
     })

For the sake of consistency all tests get updated.

While at it, some code gets improved:

- t.Fatal(err) is not a good way to report an error because
  there is no additional markup in the test output that indicates
  that there was an unexpected error. It just logs err.Error(),
  which might not be very informative and/or obvious.
- newTestDRAManager aborts in case of a failure instead of
  returning an error.
2025-12-27 09:47:56 +01:00
Bartosz
3b4f0be6e3
Check NominatedNodeName to decide if a pod is scheduled 2025-12-19 12:30:06 +00:00
Patrick Ohly
ad79e479c2 build: remove deprecated '// +build' tag
This has been replaced by `//build:...` for a long time now.

Removal of the old build tag was automated with:

    for i in $(git grep -l '^// +build' | grep -v -e '^vendor/'); do if ! grep -q '^// Code generated' "$i"; then sed -i -e '/^\/\/ +build/d' "$i"; fi; done
2025-12-18 12:16:21 +01:00
Kubernetes Prow Robot
a504b1b4eb
Merge pull request #135755 from pohly/dra-logging
DRA: log more information
2025-12-18 02:10:38 -08:00
bmordeha
6f57f1e95b Update imageLocality plugin
to account for ImageVolume images when scoring
and prioritizing nodes with required pod images

Signed-off-by: bmordeha <bmordeha@redhat.com>
2025-12-18 09:28:39 +02:00
Kubernetes Prow Robot
4a1cbabadd
Merge pull request #135495 from tosi3k/skip-last-pod-deletion
Skip last victim in async preemption if any prior Pod preemption failed
2025-12-17 22:36:28 -08:00
Kubernetes Prow Robot
62db4db266
Merge pull request #135489 from ania-borowiec/update_comment
Update async preemption comment to reflect the current state of the code
2025-12-17 22:36:13 -08:00
Kubernetes Prow Robot
c5a0c31294
Merge pull request #135484 from bart0sh/PR209-improve-balanced-allocation-coverage
Extended resources unit tests: cover DRA resources
2025-12-17 22:36:06 -08:00
Kubernetes Prow Robot
1a3d8712f3
Merge pull request #135394 from brejman/adhoc-interpodaffinity-pending-pod-update
Fix queue hint for interpodaffinity when target pod is updated
2025-12-17 21:42:46 -08:00
Kubernetes Prow Robot
285eb9fdba
Merge pull request #135325 from brejman/issue-134393
Fix queue hint for inter-pod anti-affinity
2025-12-17 20:01:02 -08:00
Bartosz
d6d8639349
Fix queue hint for interpod antiaffinity 2025-12-16 13:01:15 +00:00
Bartosz
145adcd522
Fix queue hint for interpodaffinity when target pod is updated 2025-12-16 12:57:50 +00:00
Patrick Ohly
5d536bfb8e DRA: log more information
For debugging double allocation of the same
device (https://github.com/kubernetes/kubernetes/issues/133602) it is necessary
to have information about pools, devices and in-flight claims. Log calls get
extended and the config for DRA CI jobs updated to enable higher verbosity for
relevant source files.

Log output in such a cluster at verbosity 6 looks like this:

I1215 10:28:54.166872       1 allocator_incubating.go:130] "Gathered pool information" logger="FilterWithNominatedPods.Filter.DynamicResources" pod="dra-8841/tester-3" node="kind-worker2" pools={"count":1,"devices":["dra-8841.k8s.io/kind-worker2/device-00"],"meta":[{"InvalidReason":"","id":"dra-8841.k8s.io/kind-worker2","isIncomplete":false,"isInvalid":false}]}
I1215 10:28:54.166941       1 allocator_incubating.go:254] "Gathered information about devices" logger="FilterWithNominatedPods.Filter.DynamicResources" pod="dra-8841/tester-3" node="kind-worker2" allocatedDevices={"count":2,"devices":["dra-8841.k8s.io/kind-worker/device-00","dra-8841.k8s.io/kind-worker3/device-00"]} minDevicesToBeAllocated=1
2025-12-16 09:58:05 +01:00
Ed Bartosh
1820dc7535 Fit tests: add DRA-aware test cases 2025-12-12 15:48:18 +02:00
Ed Bartosh
7860effc2c resourceAllocationScorer: add unit test for DRA nodeMatches 2025-12-12 15:48:13 +02:00
Ed Bartosh
02a39d6c1e Balanced allocation tests: cover DRA resources
- Added DRA-aware test cases
- Pulled shared DRA setup out into helper to keep tests DRY
- Added SignPod test
2025-12-12 13:51:19 +02:00
Antoni Zawodny
7577f84e79 Skip last victim in async preemption if any prior Pod preemption failed 2025-12-10 14:44:06 +01:00
Ania Borowiec
0cf3d0e20a
Update comment to reflect the current state of the code 2025-11-27 22:10:02 +00:00
Mohammad Varmazyar
4c2fff1934 Address comments, log level, test assersion consistency and remove unnecessary locks in TestFlushUnschedulablePodsLeftoverSetsFlag 2025-11-26 14:08:05 +01:00
Mohammad Varmazyar
4f455c9c0d Refactor plugin clearing to use ClearRejectorPlugins method 2025-11-26 09:54:32 +01:00
Mohammad Varmazyar
bc632c72d0 scheduler: add metric for pods scheduled after flush
Add counter metric to track pods that schedule immediately after
being flushed from unschedulablePods due to timeout. Uses a boolean
flag that is cleared when pods return to queue or move via events.
2025-11-24 09:38:41 +01:00
Mohammad Varmazyar
b2a399cf30 scheduler: add metric for pods scheduled after flush
This metric tracks pods that successfully schedule after being
flushed from unschedulablePods due to timeout. High values may
indicate missing queue hint optimizations or event handling issues.
2025-11-24 09:38:40 +01:00
Ravi Sastry Kadali
9dc5683c56 scheduler: Fix memory leak in scheduler cache
The `removeSlice` function was leaving behind references to the
removed element, preventing it from being garbage-collected.
This commit ensures that removed entries are fully cleared,
eliminating the memory leak.

Co-authored-by: ravisastryk <ravisastryk@gmail.com>
Signed-off-by: Sujal Shah <sujalshah28092004@gmail.com>
2025-11-20 02:18:38 +05:30
bwsalmon
854e67bb51
KEP 5598: Opportunistic Batching (#135231)
* First version of batching w/out signatures.

* First version of pod signatures.

* Integrate batching with signatures.

* Fix merge conflicts.

* Fixes from self-review.

* Test fixes.

* Fix a bug that limited batches to size 2
Also add some new high-level logging and
simplify the pod affinity signature.

* Re-enable batching on perf tests for now.

* fwk.NewStatus(fwk.Success)

* Review feedback.

* Review feedback.

* Comment fix.

* Two plugin specific unit tests.:

* Add cycle state to the sign call, apply to topo spread.
Also add unit tests for several plugi signature
calls.

* Review feedback.

* Switch to distinct stats for hint and store calls.

* Switch signature from string to []byte

* Revert cyclestate in signs. Update node affinity.
Node affinity now sorts all of the various
nested arrays in the structure. CycleState no
longer in signature; revert to signing fewer
cases for pod spread.

* hack/update-vendor.sh

* Disable signatures when extenders are configured.

* Update pkg/scheduler/framework/runtime/batch.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update staging/src/k8s.io/kube-scheduler/framework/interface.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Review feedback.

* Disable node resource signatures when extended DRA enabled.

* Review feedback.

* Update pkg/scheduler/framework/plugins/imagelocality/image_locality.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/interface.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/plugins/nodedeclaredfeatures/nodedeclaredfeatures.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/runtime/batch.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Review feedback.

* Fixes for review suggestions.

* Add integration tests.

* Linter fixes, test fix.

* Whitespace fix.

* Remove broken test.

* Unschedulable test.

* Remove go.mod changes.

---------

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
2025-11-12 21:51:37 -08:00
ndixita
5ac2ffcc1e
Enabling NodeDeclaredFeatures in unit tests
Signed-off-by: ndixita <ndixita@google.com>
2025-11-12 08:26:15 +00:00
ndixita
7645eb70e9
Scheduler changes to support pod level resources in place resize 2025-11-11 18:15:22 +00:00
Heba
aceb89debc
KEP-5471: Extend tolerations operators (#134665)
* Add numeric operations to tolerations

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* code review feedback

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* add default feature gate

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* Add integration tests

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* Add toleration value validation

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>

* Add validate options for new operators

Signed-off-by: helayoty <heelayot@microsoft.com>

* Remove log

Signed-off-by: helayoty <heelayot@microsoft.com>

* Update feature gate check

Signed-off-by: helayoty <heelayot@microsoft.com>

* emove IsValidNumericString func

Signed-off-by: helayoty <heelayot@microsoft.com>

* Implement IsDecimalInteger

Signed-off-by: helayoty <heelayot@microsoft.com>

* code review feedback

Signed-off-by: helayoty <heelayot@microsoft.com>

* Add logs to v1/toleration

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
Signed-off-by: helayoty <heelayot@microsoft.com>

* Update integration tests and address code review feedback

Signed-off-by: helayoty <heelayot@microsoft.com>

* Add feature gate to the scheduler framework

Signed-off-by: helayoty <heelayot@microsoft.com>

* Remove extra test

Signed-off-by: helayoty <heelayot@microsoft.com>

* Fix integration test

Signed-off-by: helayoty <heelayot@microsoft.com>

* pass feature gate via TolerationsTolerateTaint

Signed-off-by: helayoty <heelayot@microsoft.com>

---------

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
Signed-off-by: helayoty <heelayot@microsoft.com>
2025-11-10 12:42:54 -08:00
Kubernetes Prow Robot
0cfbf89e70
Merge pull request #134189 from mortent/NewUpdatePartitionableDevices
Updates to DRA Partitionable Devices feature
2025-11-06 16:10:53 -08:00
Kubernetes Prow Robot
6232175b94
Merge pull request #134935 from alaypatel07/refactor-dra-extended-resources
refactor dra extended resources implementation in scheduler plugin
2025-11-06 15:18:59 -08:00
Morten Torkildsen
38b5750e33 DRA: Update allocator for Partitionable Devices 2025-11-06 21:30:01 +00:00
Alay Patel
f8ccc4c4d7 dra scheduler plugin: refactor extendeddynamicresources.go for readibility
Signed-off-by: Alay Patel <alayp@nvidia.com>
2025-11-06 15:49:33 -05:00
Kubernetes Prow Robot
22962087ec
Merge pull request #135186 from pohly/dra-scheduler-unit-test-flake
DRA: fix for scheduler unit test flake + logging
2025-11-06 12:43:23 -08:00
Alay Patel
da9f1d8eed dra scheduler plugin: move extended resources functions into separate file
Signed-off-by: Alay Patel <alayp@nvidia.com>
2025-11-06 14:58:59 -05:00
Kubernetes Prow Robot
14134e03a8
Merge pull request #134058 from bart0sh/PR200-DRA-scoring-extended-resources
Implement scoring for extended resources backed up by DRA
2025-11-06 11:50:52 -08:00
Patrick Ohly
1c4cab9dda DRA scheduler unit test: fix race with ResourceSlice informer
The test started without waiting for the ResourceSlice informer to have
synced. As a result, the "CEL-runtime-error-for-one-of-three-nodes" test case
failed randomly with a very low flake rate (less than 1% in local runs) because
CEL expressions never got evaluated due to not having the slices (yet).

Other tests also were less reliable, but not known to fail.
2025-11-06 18:40:35 +01:00
Ed Bartosh
fc404b6a3d Cache DRA state for scoring extended resources
Extend Fit and BalancedAllocation PreScore state with the the
allocated state, the list of ResourceSlices and the device class
mapping. Gather these once during PreScore and pass them through
the scoring path instead of re-fetching for every scoring call.

This should speed up scoring of DRA extended resources, lowering
scheduling overhead.

Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Co-authored-by: Maciej Skoczeń <mskoczen@google.com>
Co-authored-by: Dominik Marciński <gmidon@gmail.com>
2025-11-06 18:09:11 +02:00
Maciej Skoczeń
8d67173de0 Implement Gang scheduling in kube-scheduler 2025-11-06 10:47:29 +00:00
Kubernetes Prow Robot
b869afe68d
Merge pull request #133389 from pravk03/node-capabilities
Introduce node declared features framework
2025-11-06 01:32:54 -08:00
Ed Bartosh
edbc32fa60 DRA: implement scoring for extended resources
Updated extended resource allocation scorer to calculate
allocatable and requested values for DRA-backed resources.
2025-11-06 10:40:52 +02:00