Commit graph

200 commits

Author SHA1 Message Date
Antoni Zawodny
833b7205fc Run PreBind plugins in parallel if feasible 2026-01-11 14:19:18 +01:00
Antoni Zawodny
16b375e4ef Generalize ErrorChannel to other underlying types 2026-01-11 13:58:06 +01:00
Patrick Ohly
7a4d650125 DRA extended resources: fix flake in unit tests
The tests assumed that instantiating a DRAManager followed by
informerFactory.WaitForCacheSync would be enough to have the manager
up-to-date, but that's not correct: the test only waits for informer *caches*
to be synced, but syncing *event handlers* like the one in the manager may
still be going on. The flake rate is low, though:

    $ GOPATH/bin/stress -p 256 ./noderesources.test
    5s: 0 runs so far, 0 failures, 256 active
    10s: 256 runs so far, 0 failures, 256 active
    15s: 256 runs so far, 0 failures, 256 active
    20s: 512 runs so far, 0 failures, 256 active
    25s: 567 runs so far, 0 failures, 256 active
    30s: 771 runs so far, 0 failures, 256 active

    /tmp/go-stress-20251226T181044-974980161
    --- FAIL: TestCalculateResourceAllocatableRequest (0.81s)
        --- FAIL: TestCalculateResourceAllocatableRequest/DRA-backed-resource-with-shared-device-allocation (0.00s)
            extendedresourcecache.go:197: I1226 18:11:14.431337] Updated extended resource cache for explicit mapping extendedResource="extended.resource.dra.io/something" deviceClass="device-class-name"
            extendedresourcecache.go:204: I1226 18:11:14.431380] Updated extended resource cache for default mapping extendedResource="deviceclass.resource.kubernetes.io/device-class-name" deviceClass="device-class-name"
            extendedresourcecache.go:220: I1226 18:11:14.431394] Updated device class mapping deviceClass="device-class-name" extendedResource="extended.resource.dra.io/something"
            resource_allocation_test.go:595: Expected requested=2, but got requested=1
    FAIL

It becomes higher when changing WaitForCacheSync such that it doesn't poll and
therefore returns more promptly, which is where this flake was first observed.

The fix is to run the test in a syntest bubble where Wait can be used to wait
for all background activity, including event handling, to be finished before
proceeding with the test.

synctest is less forgiving about lingering goroutines. A synctest bubble must
wait for gouroutines to stop, which in this case means that there has to be
a way to wait for the metric recorder shutdown. Event handlers have to be
removed.

This could be done with plain Go, but here test/utils/ktesting is used instead
because it offers some advantages:
- less boilerplate code
- automatic cancellation of the context (i.e. less manual context.WithCancel)
- tCtx.SyncTest is a direct substitute for t.Run, which avoids re-indenting
  sub-tests. synctest itself needs another anonymous function, which makes
  the line too long and forced re-indention:
     t.Run(... func(...) {
         synctest.Test(... func() {
         })
     })

For the sake of consistency all tests get updated.

While at it, some code gets improved:

- t.Fatal(err) is not a good way to report an error because
  there is no additional markup in the test output that indicates
  that there was an unexpected error. It just logs err.Error(),
  which might not be very informative and/or obvious.
- newTestDRAManager aborts in case of a failure instead of
  returning an error.
2025-12-27 09:47:56 +01:00
Patrick Ohly
ad79e479c2 build: remove deprecated '// +build' tag
This has been replaced by `//build:...` for a long time now.

Removal of the old build tag was automated with:

    for i in $(git grep -l '^// +build' | grep -v -e '^vendor/'); do if ! grep -q '^// Code generated' "$i"; then sed -i -e '/^\/\/ +build/d' "$i"; fi; done
2025-12-18 12:16:21 +01:00
bwsalmon
854e67bb51
KEP 5598: Opportunistic Batching (#135231)
* First version of batching w/out signatures.

* First version of pod signatures.

* Integrate batching with signatures.

* Fix merge conflicts.

* Fixes from self-review.

* Test fixes.

* Fix a bug that limited batches to size 2
Also add some new high-level logging and
simplify the pod affinity signature.

* Re-enable batching on perf tests for now.

* fwk.NewStatus(fwk.Success)

* Review feedback.

* Review feedback.

* Comment fix.

* Two plugin specific unit tests.:

* Add cycle state to the sign call, apply to topo spread.
Also add unit tests for several plugi signature
calls.

* Review feedback.

* Switch to distinct stats for hint and store calls.

* Switch signature from string to []byte

* Revert cyclestate in signs. Update node affinity.
Node affinity now sorts all of the various
nested arrays in the structure. CycleState no
longer in signature; revert to signing fewer
cases for pod spread.

* hack/update-vendor.sh

* Disable signatures when extenders are configured.

* Update pkg/scheduler/framework/runtime/batch.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update staging/src/k8s.io/kube-scheduler/framework/interface.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Review feedback.

* Disable node resource signatures when extended DRA enabled.

* Review feedback.

* Update pkg/scheduler/framework/plugins/imagelocality/image_locality.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/interface.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/plugins/nodedeclaredfeatures/nodedeclaredfeatures.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Update pkg/scheduler/framework/runtime/batch.go

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>

* Review feedback.

* Fixes for review suggestions.

* Add integration tests.

* Linter fixes, test fix.

* Whitespace fix.

* Remove broken test.

* Unschedulable test.

* Remove go.mod changes.

---------

Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
2025-11-12 21:51:37 -08:00
Maciej Skoczeń
8d67173de0 Implement Gang scheduling in kube-scheduler 2025-11-06 10:47:29 +00:00
Hemant Kumar
002774c315 Address review comments 2025-11-04 11:16:43 -05:00
Hemant Kumar
fe3722dfa9 Address review comments
Change type name and stuff
2025-11-03 16:27:06 -05:00
Hemant Kumar
c71e45c735 Implement a csimanager for managing storage related assets 2025-10-31 11:06:58 -04:00
Hemant Kumar
7bbec73192 Add a interface for sharing CSINode objects between scheduler and CAS 2025-10-30 13:53:10 -04:00
Ania Borowiec
fadb40199f
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler 2025-09-02 09:42:53 +00:00
Kensei Nakada
ac9fad6030 feat: trigger PreFilterPreBind in the binding cycle 2025-07-29 19:01:02 +09:00
Maciej Skoczeń
17d733e243 KEP-5229: Send API calls through dispatcher and cache 2025-07-25 15:35:36 +00:00
Ania Borowiec
aecd37e6fb
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler 2025-07-24 12:10:58 +00:00
Kensei Nakada
ebae419337 feat: add PreBindPreFlight and implement in in-tree plugins 2025-07-05 17:14:21 -07:00
Ania Borowiec
ee8c265d35
Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework 2025-06-30 10:06:22 +00:00
Ania Borowiec
00d3750503
Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes (#132190)
* Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes

apply review comment and fix linter warning

* update-vendor.sh

* update doc comments

* run update-vendor.sh
2025-06-26 08:06:29 -07:00
Kubernetes Prow Robot
e0859f91b7
Merge pull request #131887 from ania-borowiec/extract_cyclestate_interface
Moving Scheduler interfaces to staging: split CycleState into interface and implementation, move interface to staging repo
2025-05-30 04:00:18 -07:00
Ania Borowiec
d75af825fb
Extract interface CycleState and move is to staging repo. CycleState implementation remains in k/k/pkg/scheduler/framework 2025-05-29 16:18:36 +00:00
Nick Parker
7f57c6e52d Update factory to use generics, keep single New function 2025-05-13 06:56:27 +12:00
Kubernetes Prow Robot
8a6b916765
Merge pull request #130720 from saintube/scheduler-expose-nodeinfo-in-prefilter
Expose NodeInfo to PreFilter plugins
2025-04-23 13:31:29 -07:00
saintube
8dc6806d26 Expose NodeInfo to PreFilter plugins and Framework
Co-authored-by: Zhan Sheng <49895476+AxeZhan@users.noreply.github.com>
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>
Signed-off-by: saintube <saintube@foxmail.com>
2025-03-21 14:55:25 +08:00
carlory
aab7a079fa make each scheduler test independent
Signed-off-by: carlory <baofa.fan@daocloud.io>
2025-03-13 14:39:50 +08:00
dom4ha
4deb4f2b5f Trigger rescheduling on delete event also when unscheduled pod is removed 2025-03-10 15:03:50 +00:00
saintube
afb4e96510 Expose NodeInfo to Score plugins
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>
Signed-off-by: saintube <saintube@foxmail.com>
2025-03-04 17:57:14 +08:00
Ania Borowiec
4205f04ce3
Replace uses of reflect.DeepEqual with cmp.Diff in pkg/scheduler tests 2025-02-26 09:27:51 +00:00
Kensei Nakada
c322294883 implement PodActivator to activate when preemption fails 2024-11-07 14:09:35 +09:00
Kuba Tużnik
87cd496a29 scheduler/framework: introduce pluggable SharedDRAManager
SharedDRAManager will be used by the DRA plugin to obtain DRA
objects, and to track modifications to them in-memory. The current
DRA plugin behavior will be the default implementation of
SharedDRAManager.

Plugging a different implementation will allow Cluster Autoscaler
to provide a simulated state of DRA objects to the DRA plugin when
making scheduling simulations, as well as obtain the modifications
to DRA objects from the plugin.
2024-11-05 13:52:57 +01:00
googs1025
69831b0043 chore(scheduler): refactor import package ordering 2024-09-18 20:31:03 +08:00
Richa Banker
6944af22d1 Initialize scheduler metrics after metrics options are applied 2024-09-13 12:52:32 -07:00
dom4ha
e7827879db Enable testing logger in the remaining scheduler tests. 2024-09-09 21:59:24 +00:00
Kensei Nakada
8519d3399f chore: move the scheduler internal components out of internal dir 2024-08-25 13:10:29 +09:00
Maciej Skoczeń
33815db3c1 Move NominatedPodsForNode to scheduling queue directly 2024-08-21 07:24:52 +00:00
Kubernetes Prow Robot
ea1143efc7
Merge pull request #126022 from macsko/new_node_to_status_map_structure
Change structure of NodeToStatus map in scheduler
2024-08-13 21:02:55 -07:00
Maciej Skoczeń
98be7dfc5d Change structure of NodeToStatus map in scheduler 2024-07-25 07:48:35 +00:00
Kubernetes Prow Robot
24fbb13eaf
Merge pull request #126113 from googs1025/enqueueExtensions_refactor
scheduler: Add ctx param and error return to EnqueueExtensions.EventsToRegister()
2024-07-18 00:53:25 -07:00
googs1025
a3978e8315 scheduler: Add ctx param and error return to EnqueueExtensions.EventsToRegister() 2024-07-18 12:22:17 +08:00
Kubernetes Prow Robot
d879103c28
Merge pull request #125820 from macsko/add_separate_lock_for_pod_nominator_scheduling_queue
Add a separate lock for pod nominator in scheduling queue
2024-07-17 12:06:10 -07:00
Maciej Skoczeń
5def93b10a Add a separate lock for pod nominator in scheduling queue 2024-07-17 07:58:59 +00:00
Kubernetes Prow Robot
31062790a1
Merge pull request #125855 from googs1025/refactor_scheduler_ut
chore: call close framework when finishing
2024-07-12 05:14:35 -07:00
googs1025
d4627f16a5 chore: call close framework when finishing
Signed-off-by: googs1025 <googs1025@gmail.com>
2024-07-12 18:11:04 +08:00
Kubernetes Prow Robot
b6899c5e08
Merge pull request #122251 from olderTaoist/unschedulable-plugin
register unschedulable plugin  for those plugins that PreFilter's PreFilterResult filter out some nodes
2024-07-05 05:44:26 -07:00
olderTaoist
b478621596 register unscheduable plugin when prefileter with NodeNames 2024-07-02 13:02:45 +08:00
Kubernetes Prow Robot
8c478a06d8
Merge pull request #124595 from pohly/dra-scheduler-assume-cache-eventhandlers
DRA: scheduler event handlers via assume cache
2024-06-25 11:56:28 -07:00
Patrick Ohly
9a6f3b9388 scheduler: central ResourceClaim assume cache
This enables connecting the event handler for ResourceClaim to the assume
cache, which addresses a theoretic race condition.

It may also be useful for implementing the autoscaler support, because now
the autoscaler can modify the content of the cache.
2024-06-25 14:00:25 +02:00
NoicFank
31a4b13238 enhancement(scheduler): share waitingPods among profiles 2024-05-17 17:07:27 +08:00
kerthcet
84750fe52e Revert "enhancement(scheduler): share waitingPods among profiles"
This reverts commit 227c1915db.
2024-03-19 22:52:59 +01:00
NoicFank
227c1915db enhancement(scheduler): share waitingPods among profiles 2024-02-01 10:06:23 +08:00
Kubernetes Prow Robot
5b979a3a53
Merge pull request #122498 from Gekko0114/close
Allow framework plugins to be closed
2024-01-08 17:30:36 +01:00
moriya
288c00c0c7 Allow framework plugins to be closed 2024-01-06 10:11:19 +09:00