Commit graph

76 commits

Author SHA1 Message Date
vshkrabkov
b78cdbfdf4 Adds test cases for multiple preEnqueue plugins 2026-01-09 15:35:48 +00:00
vshkrabkov
779ff43005 Add unschedulabe pods metric drop for pod deletion 2026-01-07 15:17:27 +00:00
Manthan Parmar
41cde37f00 Update pkg/scheduler/backend/queue/scheduling_queue.go
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
2025-12-30 15:05:51 +00:00
Manuel Grandeit
66d4bd3206 Fix data race in PriorityQueue.UnschedulablePods()
The UnschedulablePods() function iterates over the unschedulablePods.podInfoMap
without holding any lock, while other goroutines may concurrently modify the map
via addOrUpdate(), delete(), or clear().

Other functions like PendingPods() and GetPod() correctly acquire p.lock.RLock()
before accessing unschedulablePods.podInfoMap, but UnschedulablePods() was
missing this.

Fix by adding p.lock.RLock()/RUnlock() to UnschedulablePods(), matching the
pattern used by PendingPods().
2025-12-20 13:46:58 +01:00
Kubernetes Prow Robot
1757c6358b
Merge pull request #135368 from vshkrabkov/fix/scheduler-queue-metric-sync
Scheduler: Fix GatedPods metric desync in unschedulable queue
2025-12-17 21:42:00 -08:00
Vlad Shkrabkov
5be527b78e Scheduler: Fix GatedPods metric desync in unschedulable queue
Previously, when a Pod residing in the 'unschedulablePods' queue was updated and subsequently rejected by PreEnqueue plugins (returning 'Wait'), the logic in 'moveToActiveQ' would return early because the Pod was already present in the queue.

This caused the 'scheduler_gated_pods_total' metric to fail to increment, leading to metric inconsistencies (and potentially negative values upon Pod deletion).

This change adds a check to detect the transition from Ungated to Gated. If detected, the Pod is removed and re-added to the queue to ensure metrics are correctly swapped (Unschedulable-- and Gated++).

Added regression test 'TestSchedulingQueueMetrics_UngatedToGated' to verify the fix.

Signed-off-by: Vlad Shkrabkov <vshkrabkov@google.com>
2025-12-15 11:47:22 +00:00
Mohammad Varmazyar
4c2fff1934 Address comments, log level, test assersion consistency and remove unnecessary locks in TestFlushUnschedulablePodsLeftoverSetsFlag 2025-11-26 14:08:05 +01:00
Mohammad Varmazyar
4f455c9c0d Refactor plugin clearing to use ClearRejectorPlugins method 2025-11-26 09:54:32 +01:00
Mohammad Varmazyar
d64e09c697 Clear plugins at handleSchedulingFailure and preserve both at Pop 2025-11-24 20:32:41 +01:00
Mohammad Varmazyar
ec05bcf186 test: simplify TestFlushUnschedulablePodsLeftoverSetsFlag
scheduler: add logging for pods scheduled after flush and preserve UnschedulablePlugins
2025-11-24 09:55:52 +01:00
Mohammad Varmazyar
e5e8ef993c Add unit test for WasFlushedFromUnschedulable flag 2025-11-24 09:38:41 +01:00
Mohammad Varmazyar
6a1a71ddc5 Removing the reduntant WasFlushedFromUnschedulable 2025-11-24 09:38:41 +01:00
Mohammad Varmazyar
bc632c72d0 scheduler: add metric for pods scheduled after flush
Add counter metric to track pods that schedule immediately after
being flushed from unschedulablePods due to timeout. Uses a boolean
flag that is cleared when pods return to queue or move via events.
2025-11-24 09:38:41 +01:00
Mohammad Varmazyar
b2a399cf30 scheduler: add metric for pods scheduled after flush
This metric tracks pods that successfully schedule after being
flushed from unschedulablePods due to timeout. High values may
indicate missing queue hint optimizations or event handling issues.
2025-11-24 09:38:40 +01:00
Kubernetes Prow Robot
597a684bb0
Merge pull request #133172 from ania-borowiec/move_handle_and_plugin
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler
2025-09-08 06:05:31 -07:00
Maciej Skoczeń
4babdf8026 Fix race in movePodsToActiveOrBackoffQueue 2025-09-02 11:57:18 +00:00
Ania Borowiec
fadb40199f
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler 2025-09-02 09:42:53 +00:00
Kubernetes Prow Robot
5fb3296920
Merge pull request #132451 from macsko/fix_race_in_scheduler_integration_tests
Fix race in scheduler integration tests
2025-08-31 05:03:09 -07:00
Maciej Skoczeń
46e10103ff Take activeQ lock for part of the Update method 2025-08-25 12:30:43 +00:00
Maciej Skoczeń
8b0b0df431 Don't run PreEnqueue when pod is activated from backoffQ 2025-08-22 12:40:41 +00:00
Maciej Skoczeń
aa59f930b3 Add lock to TestAsyncPreemption to prevent races 2025-08-05 09:43:12 +00:00
Maciej Skoczeń
c5ef720837 Fix race in scheduler integration tests 2025-08-05 09:42:52 +00:00
yliao
34a64db2c7 extended resource backed by DRA: implementation 2025-07-29 18:55:21 +00:00
Maciej Skoczeń
17d733e243 KEP-5229: Send API calls through dispatcher and cache 2025-07-25 15:35:36 +00:00
Ania Borowiec
aecd37e6fb
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler 2025-07-24 12:10:58 +00:00
Omar Nasser
45c355ca58 Move unschedulablePods struct to a separate file 2025-07-11 19:48:11 +03:00
Junhao Zou
1b730abf8d cleanup: use HandleErrorWithXXX instead of logger.Error where errors are intentionally ignored 2025-07-08 09:34:49 +08:00
Ania Borowiec
ee8c265d35
Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework 2025-06-30 10:06:22 +00:00
Ania Borowiec
00d3750503
Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes (#132190)
* Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes

apply review comment and fix linter warning

* update-vendor.sh

* update doc comments

* run update-vendor.sh
2025-06-26 08:06:29 -07:00
Kensei Nakada
f694c58c6c feat: graduate QueueingHint to GA 2025-05-26 21:23:46 +02:00
Maciej Skoczeń
157903b09b Skip backoff when PodMaxBackoffDuration is set to zero 2025-05-26 09:35:53 +00:00
Kensei Nakada
adc4916dfe feat: introduce pInfo.UnschedulableCount to make the backoff calculation more appropriate 2025-05-17 12:39:58 +02:00
Kubernetes Prow Robot
0113538e59
Merge pull request #127180 from sanposhiho/general-gate
feat: introduce pInfo.GatingPlugin to filter out events more generally
2025-05-14 05:13:18 -07:00
Kensei Nakada
5140786829 feat: improve the backoff calculation to o(1) 2025-05-12 01:26:47 +02:00
Kensei Nakada
d28c8cd488 fix: not removing the plugin from the unsched plugins after PreEnqueue 2025-05-07 14:12:23 +02:00
Kensei Nakada
47d296d62d feat: introduce pInfo.GatingPlugin to filter out events more generally 2025-05-07 13:54:47 +02:00
Ania Borowiec
17acc4a5ee
Move queue.Done() before Prebind, add tests 2025-03-20 22:14:36 +00:00
Maciej Skoczeń
c7919f5e22 Pop from the backoffQ when the activeQ is empty 2025-03-20 16:07:13 +00:00
Kubernetes Prow Robot
65d9066665
Merge pull request #130680 from macsko/update_backoffq_less_function_to_order_by_priority_in_windows
Update backoffQ's less function to order pods by priority in windows
2025-03-20 01:36:31 -07:00
Maciej Skoczeń
e367dca6c5 Change backoffQ less function to order pods by priority in windows 2025-03-19 13:04:15 +00:00
Maciej Skoczeń
1be3f8961b Fix a race when closing activeQ 2025-03-18 10:25:56 +00:00
Maciej Skoczeń
9df0f6b604 Call PreEnqueue plugins before adding pod to backoffQ 2025-03-14 08:47:40 +00:00
carlory
aab7a079fa make each scheduler test independent
Signed-off-by: carlory <baofa.fan@daocloud.io>
2025-03-13 14:39:50 +08:00
Maciej Skoczeń
2fc3cd90b1 Store pod backoff expiration time in QueuedPodInfo 2025-03-06 10:45:38 +00:00
Maciej Skoczeń
6975572a80 Add missing increments of queue_incoming_pods_total metric in scheduling queue 2025-03-04 12:37:22 +00:00
Maciej Skoczeń
0f24b9ff45 Split backoffQ into backoffQ and errorBackoffQ in scheduler 2025-02-24 14:11:26 +00:00
Maciej Skoczeń
0452ae402a Use cached calculateResource result when removing pod from NodeInfo in preemption 2025-01-21 10:02:57 +00:00
Kubernetes Prow Robot
fb033826a8
Merge pull request #128170 from sanposhiho/async-preemption
feature(KEP-4832): asynchronous preemption
2024-11-07 19:44:54 +00:00
Kensei Nakada
b96eee847e feat: graduate SchedulerQueueingHints to beta 2024-11-07 21:45:18 +09:00
Kensei Nakada
105d489aa4 chore: wording 2024-11-07 14:09:35 +09:00