vshkrabkov
779ff43005
Add unschedulabe pods metric drop for pod deletion
2026-01-07 15:17:27 +00:00
Manthan Parmar
41cde37f00
Update pkg/scheduler/backend/queue/scheduling_queue.go
...
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
2025-12-30 15:05:51 +00:00
Manuel Grandeit
66d4bd3206
Fix data race in PriorityQueue.UnschedulablePods()
...
The UnschedulablePods() function iterates over the unschedulablePods.podInfoMap
without holding any lock, while other goroutines may concurrently modify the map
via addOrUpdate(), delete(), or clear().
Other functions like PendingPods() and GetPod() correctly acquire p.lock.RLock()
before accessing unschedulablePods.podInfoMap, but UnschedulablePods() was
missing this.
Fix by adding p.lock.RLock()/RUnlock() to UnschedulablePods(), matching the
pattern used by PendingPods().
2025-12-20 13:46:58 +01:00
Kubernetes Prow Robot
1757c6358b
Merge pull request #135368 from vshkrabkov/fix/scheduler-queue-metric-sync
...
Scheduler: Fix GatedPods metric desync in unschedulable queue
2025-12-17 21:42:00 -08:00
Vlad Shkrabkov
5be527b78e
Scheduler: Fix GatedPods metric desync in unschedulable queue
...
Previously, when a Pod residing in the 'unschedulablePods' queue was updated and subsequently rejected by PreEnqueue plugins (returning 'Wait'), the logic in 'moveToActiveQ' would return early because the Pod was already present in the queue.
This caused the 'scheduler_gated_pods_total' metric to fail to increment, leading to metric inconsistencies (and potentially negative values upon Pod deletion).
This change adds a check to detect the transition from Ungated to Gated. If detected, the Pod is removed and re-added to the queue to ensure metrics are correctly swapped (Unschedulable-- and Gated++).
Added regression test 'TestSchedulingQueueMetrics_UngatedToGated' to verify the fix.
Signed-off-by: Vlad Shkrabkov <vshkrabkov@google.com>
2025-12-15 11:47:22 +00:00
Mohammad Varmazyar
6a1a71ddc5
Removing the reduntant WasFlushedFromUnschedulable
2025-11-24 09:38:41 +01:00
Mohammad Varmazyar
bc632c72d0
scheduler: add metric for pods scheduled after flush
...
Add counter metric to track pods that schedule immediately after
being flushed from unschedulablePods due to timeout. Uses a boolean
flag that is cleared when pods return to queue or move via events.
2025-11-24 09:38:41 +01:00
Mohammad Varmazyar
b2a399cf30
scheduler: add metric for pods scheduled after flush
...
This metric tracks pods that successfully schedule after being
flushed from unschedulablePods due to timeout. High values may
indicate missing queue hint optimizations or event handling issues.
2025-11-24 09:38:40 +01:00
Kubernetes Prow Robot
597a684bb0
Merge pull request #133172 from ania-borowiec/move_handle_and_plugin
...
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler
2025-09-08 06:05:31 -07:00
Maciej Skoczeń
4babdf8026
Fix race in movePodsToActiveOrBackoffQueue
2025-09-02 11:57:18 +00:00
Ania Borowiec
fadb40199f
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler
2025-09-02 09:42:53 +00:00
Kubernetes Prow Robot
5fb3296920
Merge pull request #132451 from macsko/fix_race_in_scheduler_integration_tests
...
Fix race in scheduler integration tests
2025-08-31 05:03:09 -07:00
Maciej Skoczeń
46e10103ff
Take activeQ lock for part of the Update method
2025-08-25 12:30:43 +00:00
Maciej Skoczeń
8b0b0df431
Don't run PreEnqueue when pod is activated from backoffQ
2025-08-22 12:40:41 +00:00
Maciej Skoczeń
aa59f930b3
Add lock to TestAsyncPreemption to prevent races
2025-08-05 09:43:12 +00:00
Maciej Skoczeń
c5ef720837
Fix race in scheduler integration tests
2025-08-05 09:42:52 +00:00
yliao
34a64db2c7
extended resource backed by DRA: implementation
2025-07-29 18:55:21 +00:00
Maciej Skoczeń
17d733e243
KEP-5229: Send API calls through dispatcher and cache
2025-07-25 15:35:36 +00:00
Ania Borowiec
aecd37e6fb
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
2025-07-24 12:10:58 +00:00
Omar Nasser
45c355ca58
Move unschedulablePods struct to a separate file
2025-07-11 19:48:11 +03:00
Junhao Zou
1b730abf8d
cleanup: use HandleErrorWithXXX instead of logger.Error where errors are intentionally ignored
2025-07-08 09:34:49 +08:00
Ania Borowiec
ee8c265d35
Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework
2025-06-30 10:06:22 +00:00
Ania Borowiec
00d3750503
Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes ( #132190 )
...
* Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes
apply review comment and fix linter warning
* update-vendor.sh
* update doc comments
* run update-vendor.sh
2025-06-26 08:06:29 -07:00
Kensei Nakada
adc4916dfe
feat: introduce pInfo.UnschedulableCount to make the backoff calculation more appropriate
2025-05-17 12:39:58 +02:00
Kensei Nakada
d28c8cd488
fix: not removing the plugin from the unsched plugins after PreEnqueue
2025-05-07 14:12:23 +02:00
Kensei Nakada
47d296d62d
feat: introduce pInfo.GatingPlugin to filter out events more generally
2025-05-07 13:54:47 +02:00
Ania Borowiec
17acc4a5ee
Move queue.Done() before Prebind, add tests
2025-03-20 22:14:36 +00:00
Maciej Skoczeń
c7919f5e22
Pop from the backoffQ when the activeQ is empty
2025-03-20 16:07:13 +00:00
Maciej Skoczeń
e367dca6c5
Change backoffQ less function to order pods by priority in windows
2025-03-19 13:04:15 +00:00
Maciej Skoczeń
9df0f6b604
Call PreEnqueue plugins before adding pod to backoffQ
2025-03-14 08:47:40 +00:00
Maciej Skoczeń
6975572a80
Add missing increments of queue_incoming_pods_total metric in scheduling queue
2025-03-04 12:37:22 +00:00
Maciej Skoczeń
0f24b9ff45
Split backoffQ into backoffQ and errorBackoffQ in scheduler
2025-02-24 14:11:26 +00:00
Kensei Nakada
105d489aa4
chore: wording
2024-11-07 14:09:35 +09:00
Kensei Nakada
ce377efa00
fix: improve logs\
2024-11-07 14:09:35 +09:00
Kensei Nakada
49135d6173
fix: take QHint disable scenario into consideration
2024-11-07 14:09:35 +09:00
Kensei Nakada
623b2a20d2
fix: handle Activate event properly
2024-11-07 14:09:35 +09:00
Kensei Nakada
02459ca59c
fix: register the event in in-flight as necessary at Activate
2024-11-07 14:09:35 +09:00
Kensei Nakada
089457e908
fix: check correctly if the event is scale down
...
Signed-off-by: Kensei Nakada <handbomusic@gmail.com>
2024-10-22 10:01:20 +09:00
Kensei Nakada
83f9e4b6df
cleanup: remove event list
2024-10-18 11:10:10 +10:00
Kensei Nakada
a2b3a4f4dc
chore: ensure the scheduler handles events before checking the pod position
2024-10-06 21:06:45 +09:00
Kensei Nakada
24a14aa810
fix: run a test for requeueing with PreFilterResult correctly
2024-09-07 23:52:45 +09:00
Kubernetes Prow Robot
f12334be03
Merge pull request #126962 from sanposhiho/memory-leak-scheduler
...
fix(scheduler): fix a possible memory leak for QueueingHint
2024-09-06 19:01:25 +01:00
Kubernetes Prow Robot
52d4972901
Merge pull request #127109 from sanposhiho/precheck-move
...
feat: disable preCheck when QHint is enabled
2024-09-05 17:19:57 +01:00
Kensei Nakada
0b71f256a8
fix(scheduler): fix a possible memory leak for QueueingHint
2024-09-05 12:13:05 +09:00
Kubernetes Prow Robot
05df9f4675
Merge pull request #127052 from sanposhiho/add-inflight-event-metric
...
feat(scheduler): support inflight_events metric
2024-09-04 19:56:19 +01:00
Kensei Nakada
4ee1394b71
feat: disable preCheck when QHint is enabled
2024-09-04 17:43:00 +09:00
Kensei Nakada
110d28355d
feat(scheduler): support inflight_events metric
2024-09-02 10:16:43 +09:00
Kubernetes Prow Robot
59051eb003
Merge pull request #126029 from sanposhiho/backoff-preenqueue
...
scheduler: impose a backoff penalty on gated Pods
2024-08-28 21:58:01 +01:00
Kensei Nakada
b5a156971f
scheduler: impose a backoff penalty on gated Pods
2024-08-27 09:57:59 +09:00
Kensei Nakada
baf69640d3
fix(scheduler_one): call Done() as soon as possible
2024-08-27 09:30:47 +09:00