Commit graph

34 commits

Author SHA1 Message Date
Patrick Ohly
7a4d650125 DRA extended resources: fix flake in unit tests
The tests assumed that instantiating a DRAManager followed by
informerFactory.WaitForCacheSync would be enough to have the manager
up-to-date, but that's not correct: the test only waits for informer *caches*
to be synced, but syncing *event handlers* like the one in the manager may
still be going on. The flake rate is low, though:

    $ GOPATH/bin/stress -p 256 ./noderesources.test
    5s: 0 runs so far, 0 failures, 256 active
    10s: 256 runs so far, 0 failures, 256 active
    15s: 256 runs so far, 0 failures, 256 active
    20s: 512 runs so far, 0 failures, 256 active
    25s: 567 runs so far, 0 failures, 256 active
    30s: 771 runs so far, 0 failures, 256 active

    /tmp/go-stress-20251226T181044-974980161
    --- FAIL: TestCalculateResourceAllocatableRequest (0.81s)
        --- FAIL: TestCalculateResourceAllocatableRequest/DRA-backed-resource-with-shared-device-allocation (0.00s)
            extendedresourcecache.go:197: I1226 18:11:14.431337] Updated extended resource cache for explicit mapping extendedResource="extended.resource.dra.io/something" deviceClass="device-class-name"
            extendedresourcecache.go:204: I1226 18:11:14.431380] Updated extended resource cache for default mapping extendedResource="deviceclass.resource.kubernetes.io/device-class-name" deviceClass="device-class-name"
            extendedresourcecache.go:220: I1226 18:11:14.431394] Updated device class mapping deviceClass="device-class-name" extendedResource="extended.resource.dra.io/something"
            resource_allocation_test.go:595: Expected requested=2, but got requested=1
    FAIL

It becomes higher when changing WaitForCacheSync such that it doesn't poll and
therefore returns more promptly, which is where this flake was first observed.

The fix is to run the test in a syntest bubble where Wait can be used to wait
for all background activity, including event handling, to be finished before
proceeding with the test.

synctest is less forgiving about lingering goroutines. A synctest bubble must
wait for gouroutines to stop, which in this case means that there has to be
a way to wait for the metric recorder shutdown. Event handlers have to be
removed.

This could be done with plain Go, but here test/utils/ktesting is used instead
because it offers some advantages:
- less boilerplate code
- automatic cancellation of the context (i.e. less manual context.WithCancel)
- tCtx.SyncTest is a direct substitute for t.Run, which avoids re-indenting
  sub-tests. synctest itself needs another anonymous function, which makes
  the line too long and forced re-indention:
     t.Run(... func(...) {
         synctest.Test(... func() {
         })
     })

For the sake of consistency all tests get updated.

While at it, some code gets improved:

- t.Fatal(err) is not a good way to report an error because
  there is no additional markup in the test output that indicates
  that there was an unexpected error. It just logs err.Error(),
  which might not be very informative and/or obvious.
- newTestDRAManager aborts in case of a failure instead of
  returning an error.
2025-12-27 09:47:56 +01:00
Ed Bartosh
02a39d6c1e Balanced allocation tests: cover DRA resources
- Added DRA-aware test cases
- Pulled shared DRA setup out into helper to keep tests DRY
- Added SignPod test
2025-12-12 13:51:19 +02:00
Ania Borowiec
fadb40199f
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler 2025-09-02 09:42:53 +00:00
Ania Borowiec
ee8c265d35
Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework 2025-06-30 10:06:22 +00:00
Hongqi Yu
d76f40d2f3 fix(scheduler): skip best-effort pods in BalancedAllocation PreScore
- Refactored `PreScore` method in `balanced_allocation.go` to skip
  best-effort pods.
- Updated unit tests in `balanced_allocation_test.go` to check for
  the new status codes.
2025-03-07 13:13:02 +08:00
saintube
afb4e96510 Expose NodeInfo to Score plugins
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>
Signed-off-by: saintube <saintube@foxmail.com>
2025-03-04 17:57:14 +08:00
Ania Borowiec
4205f04ce3
Replace uses of reflect.DeepEqual with cmp.Diff in pkg/scheduler tests 2025-02-26 09:27:51 +00:00
Kensei Nakada
8519d3399f chore: move the scheduler internal components out of internal dir 2024-08-25 13:10:29 +09:00
AxeZhan
be48c93689 Sched framework: expose NodeInfo in all functions of PluginsRunner interface 2023-12-15 11:30:06 +08:00
Mengjiao Liu
a7466f44e0 Change the scheduler plugins PluginFactory function to use context parameter to pass logger
- Migrated pkg/scheduler/framework/plugins/nodevolumelimits to use contextual logging
- Fix golangci-lint validation failed
- Check for plugins creation err
2023-09-20 17:49:54 +08:00
Mengjiao Liu
1c05cf1d51 kube-scheduler: NewFramework function to pass the context parameter
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-05-23 10:17:34 +08:00
tangwz
3766e060e5 Avoid using negative words in PreFilter and PreScore tests. 2023-03-12 15:06:26 +08:00
tangwz
be080584c6
scheduler(NodeResourcesFit & NodeResourcesBalancedAllocation): calculatePodResourceRequest in PreScore phase (#115655)
* scheduler(NodeResourcesFit): calculatePodResourceRequest in PreScore phase

* scheduler(NodeResourcesFit and NodeResourcesBalancedAllocation): calculatePodResourceRequest in PreScore phase

* modify the comments and tests.

* revert the tests.

* don't need consider nodes.

* use list instead of map.

* add comment for podRequests.

* avoid using negative wording in variable names.
2023-03-10 07:44:53 -08:00
kerthcet
97e3e50493 Remove potential goroutine leak in NewFramework
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-08-06 00:05:22 +08:00
Wojciech Tyczyński
7060953b92 Clear shutdown of scheduler metrics recorder 2022-05-20 20:23:29 +02:00
Yibo Zhuang
bc8f3198d5 cleanup: move scheduler plugin tests to use PodWrapper
Move scheduler plugin unit tests use testing PodWrapper
where applicable to reduce duplicating pod creation
code and shorten number of lines.

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-05 10:48:55 -07:00
Paco Xu
acd696266e mark PodOverhead to GA in v1.24; remove in v1.26 2022-03-17 09:30:14 +08:00
hyschumi
7ad629c864 rm makeNodeWithExtendedResource method in noderesources unit test && doc: update least_allocated strategy detail doc 2021-11-18 09:15:26 +08:00
Ahmad Diaa
a2c37bfd09
use original requests in NodeResourcesBalancedAllocation instead of NonZero (#105845) 2021-10-29 19:04:14 -07:00
Dave Chen
1fa673c15c Extent the NodeResourcesBalancedAllocation plugin to cover more resources
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-06-30 11:15:12 +08:00
ravisantoshgudimetla
b6c75bee15 Remove balanced attached node volumes
kubernetes#60525 introduced
Balanced attached node volumes feature gate to include volume
count for prioritizing nodes. The reason for introducing this
flag was its usefulness in Red Hat OpenShift Online environment
which is not being used any more. So, removing the flag
as it helps in maintainability of the scheduler code base
as mentioned at kubernetes#101489 (comment)
2021-06-22 11:19:30 -04:00
Aldo Culquicondor
63d1237102 Fix Node Resources plugins score when there are pods with no requests
Given that we give a default CPU/memory requests for containers that don't provide any, the calculated usage can exceed the allocatable.

Change-Id: I72e249652acacfbe8cea0dd6f895dabe43ff6376
2021-06-16 20:36:07 +00:00
Mike Dame
5a77ebe28b Scheduler: remove pkg/features dependency from NodeResources plugins 2021-05-18 08:59:02 -04:00
Mengxue Zhang
b38caa91cc make runtime.NewFramework accept KubeSchedulerProfile 2021-03-05 18:30:21 +00:00
Ali
09b2e8f638 Move scheduler interface to pkg/scheduler/framework 2020-10-13 13:13:27 +11:00
Ali Farah
a22e115a0e Split scheduler framework implementation into new runtime package 2020-06-22 00:23:43 +10:00
Dave Chen
6b9fb2c591 Fix nits in comments for NodeResources plugins 2020-05-14 18:45:59 +08:00
Abdullah Gharaibeh
2c51c13620 Scheduler NodeInfo cleanup 2020-04-09 19:03:51 -04:00
Abdullah Gharaibeh
b8ddd00312 scheduler's NodeInfo tracks PodInfos instead of Pods 2020-04-08 17:53:20 -04:00
Aldo Culquicondor
f53d7e55df Move Snapshot from nodeinfo/snapshot to internal/cache
Signed-off-by: Aldo Culquicondor <acondor@google.com>
2020-01-17 13:29:41 -05:00
Abdullah Gharaibeh
902cf48cd3 Updated NewSnapshot interface to accept a NodeInfoMap instead of lists of nodes and pods 2019-11-12 11:14:17 -05:00
Abdullah Gharaibeh
6b4bd87ba3 Remove Framework dependency on nodeinfo snapshot 2019-11-06 08:37:57 -05:00
Abdullah Gharaibeh
05cb382357 Update PredicateMetadataProducer to accept a scheduler SharedLister instead of nodeinfomap 2019-10-25 19:19:23 -04:00
zouyee
bd4167d149 remove unused meta and rename lablance_allocated
Signed-off-by: Zou Nengren <zouyee1989@gmail.com>
2019-10-19 22:43:15 +08:00
Renamed from pkg/scheduler/framework/plugins/noderesources/balance_allocated_test.go (Browse further)