kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-02-15 16:58:05 -05:00

Author	SHA1	Message	Date
Patrick Ohly	7a4d650125	DRA extended resources: fix flake in unit tests The tests assumed that instantiating a DRAManager followed by informerFactory.WaitForCacheSync would be enough to have the manager up-to-date, but that's not correct: the test only waits for informer caches to be synced, but syncing event handlers like the one in the manager may still be going on. The flake rate is low, though: $ GOPATH/bin/stress -p 256 ./noderesources.test 5s: 0 runs so far, 0 failures, 256 active 10s: 256 runs so far, 0 failures, 256 active 15s: 256 runs so far, 0 failures, 256 active 20s: 512 runs so far, 0 failures, 256 active 25s: 567 runs so far, 0 failures, 256 active 30s: 771 runs so far, 0 failures, 256 active /tmp/go-stress-20251226T181044-974980161 --- FAIL: TestCalculateResourceAllocatableRequest (0.81s) --- FAIL: TestCalculateResourceAllocatableRequest/DRA-backed-resource-with-shared-device-allocation (0.00s) extendedresourcecache.go:197: I1226 18:11:14.431337] Updated extended resource cache for explicit mapping extendedResource="extended.resource.dra.io/something" deviceClass="device-class-name" extendedresourcecache.go:204: I1226 18:11:14.431380] Updated extended resource cache for default mapping extendedResource="deviceclass.resource.kubernetes.io/device-class-name" deviceClass="device-class-name" extendedresourcecache.go:220: I1226 18:11:14.431394] Updated device class mapping deviceClass="device-class-name" extendedResource="extended.resource.dra.io/something" resource_allocation_test.go:595: Expected requested=2, but got requested=1 FAIL It becomes higher when changing WaitForCacheSync such that it doesn't poll and therefore returns more promptly, which is where this flake was first observed. The fix is to run the test in a syntest bubble where Wait can be used to wait for all background activity, including event handling, to be finished before proceeding with the test. synctest is less forgiving about lingering goroutines. A synctest bubble must wait for gouroutines to stop, which in this case means that there has to be a way to wait for the metric recorder shutdown. Event handlers have to be removed. This could be done with plain Go, but here test/utils/ktesting is used instead because it offers some advantages: - less boilerplate code - automatic cancellation of the context (i.e. less manual context.WithCancel) - tCtx.SyncTest is a direct substitute for t.Run, which avoids re-indenting sub-tests. synctest itself needs another anonymous function, which makes the line too long and forced re-indention: t.Run(... func(...) { synctest.Test(... func() { }) }) For the sake of consistency all tests get updated. While at it, some code gets improved: - t.Fatal(err) is not a good way to report an error because there is no additional markup in the test output that indicates that there was an unexpected error. It just logs err.Error(), which might not be very informative and/or obvious. - newTestDRAManager aborts in case of a failure instead of returning an error.	2025-12-27 09:47:56 +01:00
Ed Bartosh	7860effc2c	resourceAllocationScorer: add unit test for DRA nodeMatches	2025-12-12 15:48:13 +02:00
Ed Bartosh	02a39d6c1e	Balanced allocation tests: cover DRA resources - Added DRA-aware test cases - Pulled shared DRA setup out into helper to keep tests DRY - Added SignPod test	2025-12-12 13:51:19 +02:00
Ed Bartosh	fc404b6a3d	Cache DRA state for scoring extended resources Extend Fit and BalancedAllocation PreScore state with the the allocated state, the list of ResourceSlices and the device class mapping. Gather these once during PreScore and pass them through the scoring path instead of re-fetching for every scoring call. This should speed up scoring of DRA extended resources, lowering scheduling overhead. Co-authored-by: Patrick Ohly <patrick.ohly@intel.com> Co-authored-by: Maciej Skoczeń <mskoczen@google.com> Co-authored-by: Dominik Marciński <gmidon@gmail.com>	2025-11-06 18:09:11 +02:00
Ed Bartosh	edbc32fa60	DRA: implement scoring for extended resources Updated extended resource allocation scorer to calculate allocatable and requested values for DRA-backed resources.	2025-11-06 10:40:52 +02:00
Todd Neal	4096c9209c	dedupe pod resource request calculation	2023-03-09 17:15:53 -06:00

6 commits