kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-02-15 00:37:52 -05:00

Author	SHA1	Message	Date
Antoni Zawodny	833b7205fc	Run PreBind plugins in parallel if feasible	2026-01-11 14:19:18 +01:00
Antoni Zawodny	16b375e4ef	Generalize ErrorChannel to other underlying types	2026-01-11 13:58:06 +01:00
Patrick Ohly	7a4d650125	DRA extended resources: fix flake in unit tests The tests assumed that instantiating a DRAManager followed by informerFactory.WaitForCacheSync would be enough to have the manager up-to-date, but that's not correct: the test only waits for informer caches to be synced, but syncing event handlers like the one in the manager may still be going on. The flake rate is low, though: $ GOPATH/bin/stress -p 256 ./noderesources.test 5s: 0 runs so far, 0 failures, 256 active 10s: 256 runs so far, 0 failures, 256 active 15s: 256 runs so far, 0 failures, 256 active 20s: 512 runs so far, 0 failures, 256 active 25s: 567 runs so far, 0 failures, 256 active 30s: 771 runs so far, 0 failures, 256 active /tmp/go-stress-20251226T181044-974980161 --- FAIL: TestCalculateResourceAllocatableRequest (0.81s) --- FAIL: TestCalculateResourceAllocatableRequest/DRA-backed-resource-with-shared-device-allocation (0.00s) extendedresourcecache.go:197: I1226 18:11:14.431337] Updated extended resource cache for explicit mapping extendedResource="extended.resource.dra.io/something" deviceClass="device-class-name" extendedresourcecache.go:204: I1226 18:11:14.431380] Updated extended resource cache for default mapping extendedResource="deviceclass.resource.kubernetes.io/device-class-name" deviceClass="device-class-name" extendedresourcecache.go:220: I1226 18:11:14.431394] Updated device class mapping deviceClass="device-class-name" extendedResource="extended.resource.dra.io/something" resource_allocation_test.go:595: Expected requested=2, but got requested=1 FAIL It becomes higher when changing WaitForCacheSync such that it doesn't poll and therefore returns more promptly, which is where this flake was first observed. The fix is to run the test in a syntest bubble where Wait can be used to wait for all background activity, including event handling, to be finished before proceeding with the test. synctest is less forgiving about lingering goroutines. A synctest bubble must wait for gouroutines to stop, which in this case means that there has to be a way to wait for the metric recorder shutdown. Event handlers have to be removed. This could be done with plain Go, but here test/utils/ktesting is used instead because it offers some advantages: - less boilerplate code - automatic cancellation of the context (i.e. less manual context.WithCancel) - tCtx.SyncTest is a direct substitute for t.Run, which avoids re-indenting sub-tests. synctest itself needs another anonymous function, which makes the line too long and forced re-indention: t.Run(... func(...) { synctest.Test(... func() { }) }) For the sake of consistency all tests get updated. While at it, some code gets improved: - t.Fatal(err) is not a good way to report an error because there is no additional markup in the test output that indicates that there was an unexpected error. It just logs err.Error(), which might not be very informative and/or obvious. - newTestDRAManager aborts in case of a failure instead of returning an error.	2025-12-27 09:47:56 +01:00
bwsalmon	854e67bb51	KEP 5598: Opportunistic Batching (#135231 ) * First version of batching w/out signatures. * First version of pod signatures. * Integrate batching with signatures. * Fix merge conflicts. * Fixes from self-review. * Test fixes. * Fix a bug that limited batches to size 2 Also add some new high-level logging and simplify the pod affinity signature. * Re-enable batching on perf tests for now. * fwk.NewStatus(fwk.Success) * Review feedback. * Review feedback. * Comment fix. * Two plugin specific unit tests.: * Add cycle state to the sign call, apply to topo spread. Also add unit tests for several plugi signature calls. * Review feedback. * Switch to distinct stats for hint and store calls. * Switch signature from string to []byte * Revert cyclestate in signs. Update node affinity. Node affinity now sorts all of the various nested arrays in the structure. CycleState no longer in signature; revert to signing fewer cases for pod spread. * hack/update-vendor.sh * Disable signatures when extenders are configured. * Update pkg/scheduler/framework/runtime/batch.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Update staging/src/k8s.io/kube-scheduler/framework/interface.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Review feedback. * Disable node resource signatures when extended DRA enabled. * Review feedback. * Update pkg/scheduler/framework/plugins/imagelocality/image_locality.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Update pkg/scheduler/framework/interface.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Update pkg/scheduler/framework/plugins/nodedeclaredfeatures/nodedeclaredfeatures.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Update pkg/scheduler/framework/runtime/batch.go Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com> * Review feedback. * Fixes for review suggestions. * Add integration tests. * Linter fixes, test fix. * Whitespace fix. * Remove broken test. * Unschedulable test. * Remove go.mod changes. --------- Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>	2025-11-12 21:51:37 -08:00
Maciej Skoczeń	8d67173de0	Implement Gang scheduling in kube-scheduler	2025-11-06 10:47:29 +00:00
Hemant Kumar	002774c315	Address review comments	2025-11-04 11:16:43 -05:00
Hemant Kumar	fe3722dfa9	Address review comments Change type name and stuff	2025-11-03 16:27:06 -05:00
Hemant Kumar	c71e45c735	Implement a csimanager for managing storage related assets	2025-10-31 11:06:58 -04:00
Hemant Kumar	7bbec73192	Add a interface for sharing CSINode objects between scheduler and CAS	2025-10-30 13:53:10 -04:00
Ania Borowiec	fadb40199f	Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler	2025-09-02 09:42:53 +00:00
Kensei Nakada	ac9fad6030	feat: trigger PreFilterPreBind in the binding cycle	2025-07-29 19:01:02 +09:00
Maciej Skoczeń	17d733e243	KEP-5229: Send API calls through dispatcher and cache	2025-07-25 15:35:36 +00:00
Ania Borowiec	aecd37e6fb	Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler	2025-07-24 12:10:58 +00:00
Ania Borowiec	ee8c265d35	Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework	2025-06-30 10:06:22 +00:00
Ania Borowiec	00d3750503	Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes (#132190 ) * Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes apply review comment and fix linter warning * update-vendor.sh * update doc comments * run update-vendor.sh	2025-06-26 08:06:29 -07:00
Ania Borowiec	d75af825fb	Extract interface CycleState and move is to staging repo. CycleState implementation remains in k/k/pkg/scheduler/framework	2025-05-29 16:18:36 +00:00
Kubernetes Prow Robot	8a6b916765	Merge pull request #130720 from saintube/scheduler-expose-nodeinfo-in-prefilter Expose NodeInfo to PreFilter plugins	2025-04-23 13:31:29 -07:00
saintube	8dc6806d26	Expose NodeInfo to PreFilter plugins and Framework Co-authored-by: Zhan Sheng <49895476+AxeZhan@users.noreply.github.com> Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com> Signed-off-by: saintube <saintube@foxmail.com>	2025-03-21 14:55:25 +08:00
dom4ha	4deb4f2b5f	Trigger rescheduling on delete event also when unscheduled pod is removed	2025-03-10 15:03:50 +00:00
saintube	afb4e96510	Expose NodeInfo to Score plugins Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com> Signed-off-by: saintube <saintube@foxmail.com>	2025-03-04 17:57:14 +08:00
Kensei Nakada	c322294883	implement PodActivator to activate when preemption fails	2024-11-07 14:09:35 +09:00
Kuba Tużnik	87cd496a29	scheduler/framework: introduce pluggable SharedDRAManager SharedDRAManager will be used by the DRA plugin to obtain DRA objects, and to track modifications to them in-memory. The current DRA plugin behavior will be the default implementation of SharedDRAManager. Plugging a different implementation will allow Cluster Autoscaler to provide a simulated state of DRA objects to the DRA plugin when making scheduling simulations, as well as obtain the modifications to DRA objects from the plugin.	2024-11-05 13:52:57 +01:00
Kubernetes Prow Robot	ea1143efc7	Merge pull request #126022 from macsko/new_node_to_status_map_structure Change structure of NodeToStatus map in scheduler	2024-08-13 21:02:55 -07:00
Maciej Skoczeń	98be7dfc5d	Change structure of NodeToStatus map in scheduler	2024-07-25 07:48:35 +00:00
googs1025	a3978e8315	scheduler: Add ctx param and error return to EnqueueExtensions.EventsToRegister()	2024-07-18 12:22:17 +08:00
Kubernetes Prow Robot	b6899c5e08	Merge pull request #122251 from olderTaoist/unschedulable-plugin register unschedulable plugin for those plugins that PreFilter's PreFilterResult filter out some nodes	2024-07-05 05:44:26 -07:00
olderTaoist	b478621596	register unscheduable plugin when prefileter with NodeNames	2024-07-02 13:02:45 +08:00
Kubernetes Prow Robot	8c478a06d8	Merge pull request #124595 from pohly/dra-scheduler-assume-cache-eventhandlers DRA: scheduler event handlers via assume cache	2024-06-25 11:56:28 -07:00
Patrick Ohly	9a6f3b9388	scheduler: central ResourceClaim assume cache This enables connecting the event handler for ResourceClaim to the assume cache, which addresses a theoretic race condition. It may also be useful for implementing the autoscaler support, because now the autoscaler can modify the content of the cache.	2024-06-25 14:00:25 +02:00
NoicFank	31a4b13238	enhancement(scheduler): share waitingPods among profiles	2024-05-17 17:07:27 +08:00
kerthcet	84750fe52e	Revert "enhancement(scheduler): share waitingPods among profiles" This reverts commit `227c1915db`.	2024-03-19 22:52:59 +01:00
NoicFank	227c1915db	enhancement(scheduler): share waitingPods among profiles	2024-02-01 10:06:23 +08:00
Kubernetes Prow Robot	5b979a3a53	Merge pull request #122498 from Gekko0114/close Allow framework plugins to be closed	2024-01-08 17:30:36 +01:00
moriya	288c00c0c7	Allow framework plugins to be closed	2024-01-06 10:11:19 +09:00
Kensei Nakada	09abd6be5a	address reviews	2024-01-02 02:10:41 +00:00
Kensei Nakada	5ab2317947	run all PreFilter when the preemption will happen later in the same scheduling cycle	2024-01-01 09:44:06 +00:00
AxeZhan	be48c93689	Sched framework: expose NodeInfo in all functions of PluginsRunner interface	2023-12-15 11:30:06 +08:00
Kubernetes Prow Robot	84424a8c19	Merge pull request #122068 from caohe/fix-multi-point fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times	2023-12-14 05:10:37 +01:00
Kubernetes Prow Robot	5322af7f9e	Merge pull request #122022 from sanposhiho/extender-fix fix: requeue pods rejected by Extenders properly	2023-12-14 05:10:01 +01:00
Kubernetes Prow Robot	badc4102ac	Merge pull request #121572 from Prateek462003/myFeature Added Logging for all the enabled plugins in each extension point	2023-12-13 22:34:06 +01:00
caohe	1f5738df84	fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times Signed-off-by: caohe <caohe9603@gmail.com>	2023-11-29 20:14:18 +08:00
hub-Prateek	a601ebd6b6	Changed the log message	2023-11-26 11:41:42 +05:30
hub-Prateek	eb45a8f2f5	Added comments	2023-11-24 11:01:15 +05:30
hub-Prateek	76be319571	Optimzed the code	2023-11-24 10:58:33 +05:30
hub-Prateek	5c99f3a24e	Logged the return value of ListPlugins	2023-11-24 00:19:42 +05:30
Kensei Nakada	468e2dac81	fix: requeue pods rejected by Extenders properly	2023-11-23 13:20:02 +00:00
hub-Prateek	9cb2d1cf6d	Removed Comments	2023-11-22 22:32:19 +05:30
hub-Prateek	1dca49157a	Utilized ListPlugins method	2023-11-22 02:13:55 +05:30
Patrick Ohly	2a23061f6c	scheduler: fix performance regression at -v3 + contextual logging The logging instrumentation for contextual logging that was added for 1.29 slowed down the scheduler (i.e. logging verbosity <= 3) by a significant percentage (-28.66% for SchedulingBasic/5000Nodes at -v3) if (and only if!) contextual logging was enabled. Retrieving the logger from the context causes no measurable slowdown, it's only the various WithName/WithValues calls which cause this. By being more careful about when to use those, the performance impact can be avoided: - At -v3 or lower, only `WithValues("pod")` is used once per scheduling cycle. This has the intended effect that all log messages for the cycle include the pod information. Once contextual logging is GA, "pod" key/value pairs can be removed from all log calls. - At -v4 or higher, richer log entries get produced where `WithValues` is also used for the node (when applicable) and `WithName` is used for the current operation and plugin. With these changes, enabling contextual logging causes no measurable slowdown at -v3 or lower. At -v4, the slowdown depends on the test case (-30.51% throughput for SchedulingBasic/5000Nodes, no change for SchedulingCSIPVs/5000Nodes). For some unknown reason (measuring bias?), SchedulingCSIPVs/500Nodes has a ~3& higher throughput with contextual logging.	2023-11-03 17:28:55 +01:00
hub-Prateek	7b60e7e2a3	Added plugins enabled at each extension point	2023-11-01 23:03:13 +05:30

1 2 3 4

163 commits