Commit graph

51 commits

Author SHA1 Message Date
vshkrabkov
779ff43005 Add unschedulabe pods metric drop for pod deletion 2026-01-07 15:17:27 +00:00
Manthan Parmar
41cde37f00 Update pkg/scheduler/backend/queue/scheduling_queue.go
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
2025-12-30 15:05:51 +00:00
Manuel Grandeit
66d4bd3206 Fix data race in PriorityQueue.UnschedulablePods()
The UnschedulablePods() function iterates over the unschedulablePods.podInfoMap
without holding any lock, while other goroutines may concurrently modify the map
via addOrUpdate(), delete(), or clear().

Other functions like PendingPods() and GetPod() correctly acquire p.lock.RLock()
before accessing unschedulablePods.podInfoMap, but UnschedulablePods() was
missing this.

Fix by adding p.lock.RLock()/RUnlock() to UnschedulablePods(), matching the
pattern used by PendingPods().
2025-12-20 13:46:58 +01:00
Kubernetes Prow Robot
1757c6358b
Merge pull request #135368 from vshkrabkov/fix/scheduler-queue-metric-sync
Scheduler: Fix GatedPods metric desync in unschedulable queue
2025-12-17 21:42:00 -08:00
Vlad Shkrabkov
5be527b78e Scheduler: Fix GatedPods metric desync in unschedulable queue
Previously, when a Pod residing in the 'unschedulablePods' queue was updated and subsequently rejected by PreEnqueue plugins (returning 'Wait'), the logic in 'moveToActiveQ' would return early because the Pod was already present in the queue.

This caused the 'scheduler_gated_pods_total' metric to fail to increment, leading to metric inconsistencies (and potentially negative values upon Pod deletion).

This change adds a check to detect the transition from Ungated to Gated. If detected, the Pod is removed and re-added to the queue to ensure metrics are correctly swapped (Unschedulable-- and Gated++).

Added regression test 'TestSchedulingQueueMetrics_UngatedToGated' to verify the fix.

Signed-off-by: Vlad Shkrabkov <vshkrabkov@google.com>
2025-12-15 11:47:22 +00:00
Mohammad Varmazyar
6a1a71ddc5 Removing the reduntant WasFlushedFromUnschedulable 2025-11-24 09:38:41 +01:00
Mohammad Varmazyar
bc632c72d0 scheduler: add metric for pods scheduled after flush
Add counter metric to track pods that schedule immediately after
being flushed from unschedulablePods due to timeout. Uses a boolean
flag that is cleared when pods return to queue or move via events.
2025-11-24 09:38:41 +01:00
Mohammad Varmazyar
b2a399cf30 scheduler: add metric for pods scheduled after flush
This metric tracks pods that successfully schedule after being
flushed from unschedulablePods due to timeout. High values may
indicate missing queue hint optimizations or event handling issues.
2025-11-24 09:38:40 +01:00
Kubernetes Prow Robot
597a684bb0
Merge pull request #133172 from ania-borowiec/move_handle_and_plugin
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler
2025-09-08 06:05:31 -07:00
Maciej Skoczeń
4babdf8026 Fix race in movePodsToActiveOrBackoffQueue 2025-09-02 11:57:18 +00:00
Ania Borowiec
fadb40199f
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler 2025-09-02 09:42:53 +00:00
Kubernetes Prow Robot
5fb3296920
Merge pull request #132451 from macsko/fix_race_in_scheduler_integration_tests
Fix race in scheduler integration tests
2025-08-31 05:03:09 -07:00
Maciej Skoczeń
46e10103ff Take activeQ lock for part of the Update method 2025-08-25 12:30:43 +00:00
Maciej Skoczeń
8b0b0df431 Don't run PreEnqueue when pod is activated from backoffQ 2025-08-22 12:40:41 +00:00
Maciej Skoczeń
aa59f930b3 Add lock to TestAsyncPreemption to prevent races 2025-08-05 09:43:12 +00:00
Maciej Skoczeń
c5ef720837 Fix race in scheduler integration tests 2025-08-05 09:42:52 +00:00
yliao
34a64db2c7 extended resource backed by DRA: implementation 2025-07-29 18:55:21 +00:00
Maciej Skoczeń
17d733e243 KEP-5229: Send API calls through dispatcher and cache 2025-07-25 15:35:36 +00:00
Ania Borowiec
aecd37e6fb
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler 2025-07-24 12:10:58 +00:00
Omar Nasser
45c355ca58 Move unschedulablePods struct to a separate file 2025-07-11 19:48:11 +03:00
Junhao Zou
1b730abf8d cleanup: use HandleErrorWithXXX instead of logger.Error where errors are intentionally ignored 2025-07-08 09:34:49 +08:00
Ania Borowiec
ee8c265d35
Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework 2025-06-30 10:06:22 +00:00
Ania Borowiec
00d3750503
Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes (#132190)
* Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes

apply review comment and fix linter warning

* update-vendor.sh

* update doc comments

* run update-vendor.sh
2025-06-26 08:06:29 -07:00
Kensei Nakada
adc4916dfe feat: introduce pInfo.UnschedulableCount to make the backoff calculation more appropriate 2025-05-17 12:39:58 +02:00
Kensei Nakada
d28c8cd488 fix: not removing the plugin from the unsched plugins after PreEnqueue 2025-05-07 14:12:23 +02:00
Kensei Nakada
47d296d62d feat: introduce pInfo.GatingPlugin to filter out events more generally 2025-05-07 13:54:47 +02:00
Ania Borowiec
17acc4a5ee
Move queue.Done() before Prebind, add tests 2025-03-20 22:14:36 +00:00
Maciej Skoczeń
c7919f5e22 Pop from the backoffQ when the activeQ is empty 2025-03-20 16:07:13 +00:00
Maciej Skoczeń
e367dca6c5 Change backoffQ less function to order pods by priority in windows 2025-03-19 13:04:15 +00:00
Maciej Skoczeń
9df0f6b604 Call PreEnqueue plugins before adding pod to backoffQ 2025-03-14 08:47:40 +00:00
Maciej Skoczeń
6975572a80 Add missing increments of queue_incoming_pods_total metric in scheduling queue 2025-03-04 12:37:22 +00:00
Maciej Skoczeń
0f24b9ff45 Split backoffQ into backoffQ and errorBackoffQ in scheduler 2025-02-24 14:11:26 +00:00
Kensei Nakada
105d489aa4 chore: wording 2024-11-07 14:09:35 +09:00
Kensei Nakada
ce377efa00 fix: improve logs\ 2024-11-07 14:09:35 +09:00
Kensei Nakada
49135d6173 fix: take QHint disable scenario into consideration 2024-11-07 14:09:35 +09:00
Kensei Nakada
623b2a20d2 fix: handle Activate event properly 2024-11-07 14:09:35 +09:00
Kensei Nakada
02459ca59c fix: register the event in in-flight as necessary at Activate 2024-11-07 14:09:35 +09:00
Kensei Nakada
089457e908 fix: check correctly if the event is scale down
Signed-off-by: Kensei Nakada <handbomusic@gmail.com>
2024-10-22 10:01:20 +09:00
Kensei Nakada
83f9e4b6df cleanup: remove event list 2024-10-18 11:10:10 +10:00
Kensei Nakada
a2b3a4f4dc chore: ensure the scheduler handles events before checking the pod position 2024-10-06 21:06:45 +09:00
Kensei Nakada
24a14aa810 fix: run a test for requeueing with PreFilterResult correctly 2024-09-07 23:52:45 +09:00
Kubernetes Prow Robot
f12334be03
Merge pull request #126962 from sanposhiho/memory-leak-scheduler
fix(scheduler): fix a possible memory leak for QueueingHint
2024-09-06 19:01:25 +01:00
Kubernetes Prow Robot
52d4972901
Merge pull request #127109 from sanposhiho/precheck-move
feat: disable preCheck when QHint is enabled
2024-09-05 17:19:57 +01:00
Kensei Nakada
0b71f256a8 fix(scheduler): fix a possible memory leak for QueueingHint 2024-09-05 12:13:05 +09:00
Kubernetes Prow Robot
05df9f4675
Merge pull request #127052 from sanposhiho/add-inflight-event-metric
feat(scheduler): support inflight_events metric
2024-09-04 19:56:19 +01:00
Kensei Nakada
4ee1394b71 feat: disable preCheck when QHint is enabled 2024-09-04 17:43:00 +09:00
Kensei Nakada
110d28355d feat(scheduler): support inflight_events metric 2024-09-02 10:16:43 +09:00
Kubernetes Prow Robot
59051eb003
Merge pull request #126029 from sanposhiho/backoff-preenqueue
scheduler: impose a backoff penalty on gated Pods
2024-08-28 21:58:01 +01:00
Kensei Nakada
b5a156971f scheduler: impose a backoff penalty on gated Pods 2024-08-27 09:57:59 +09:00
Kensei Nakada
baf69640d3 fix(scheduler_one): call Done() as soon as possible 2024-08-27 09:30:47 +09:00