Commit graph

1596 commits

Author SHA1 Message Date
Kubernetes Prow Robot
5a50e0e229
Merge pull request #138497 from cupnes/kep-4049-beta
KEP-4049: Update tests for the StorageCapacityScoring feature for beta
2026-06-18 16:06:43 +05:30
Yuma Ogami
258b02d07e test: add StorageCapacityScoring=off variants to TestVolumeBinding 2026-06-18 08:40:36 +00:00
Antoni Basista
61dc7df681 Make minCount mutable in Workload and PodGroup APIs 2026-06-17 12:44:49 +00:00
Maciej Skoczeń
54ca619d4b Merge GangScheduling and WorkloadAwarePreemption feature gates into GenericWorkload 2026-06-15 11:42:10 +00:00
yliao
f8e92a5b5e fixed unit tests errors: cannot set feature gate DRAExtendedResource to false, feature is locked to true 2026-06-02 16:12:22 +00:00
Kubernetes Prow Robot
9f192ba95a
Merge pull request #139418 from nojnhuh/dra-shared-pending-claim-accounting
DRA: Fix shared in-flight allocation accounting in error paths
2026-06-02 12:06:52 +05:30
Kubernetes Prow Robot
14f9f7e9e5
Merge pull request #139332 from johnahull/fix/dra-list-type-allocator-wiring
DRA: pass ListTypeAttributes to AllocatorFeatures
2026-06-02 01:57:06 +05:30
Jon Huhn
b276de1077 DRA: Fix shared in-flight allocation accounting in error paths
In the DRA scheduler plugin, `SignalClaimPendingAllocation` and
`MaybeRemoveClaimPendingAllocation` coordinate in-flight allocations for
claims shared between Pods when scheduling a PodGroup. `Signal` is
called in Reserve to store an in-flight allocation and mark a claim as
being shared. `MaybeRemove` is called in Unreserve to release a Pod's
share of the claim and remove the in-flight allocation if no more Pods
are sharing the claim. `MaybeRemove` is also called at the end of
PreBind so successful binding cycles remove the in-flight allocation in
favor of the allocation stored in the API.

When a Pod transiently fails in PreBind, `MaybeRemove` is invoked twice:
first in PreBind, and again in Unreserve. Since shares are tracked with
a counter, these double-counted releases can cause the in-flight claim
to be removed from the cache prematurely and poison the AssumeCache with
a claim that has both an updated resourceVersion and no allocation,
halting progress.

This change refactors the `MaybeRemove` call in PreBind to only fire
when PreBind has finished successfully. When Pods sharing a claim are
binding concurrently, each one will either:

- Update the claim's allocation successfully and force-remove the
  in-flight allocation, or
- Fail to bind and wait until Unreserve to call `MaybeRemove`
2026-06-01 13:29:39 -05:00
John Hull
6fb92ce4aa Pass ListTypeAttributes to AllocatorFeatures to enable experimental allocator
The DRAListTypeAttributes feature gate was enabled but not passed
through AllocatorFeatures(), so the scheduler always selected the
incubating allocator which doesn't support list-type attribute
intersection matching (KEP-5491). The experimental allocator has
the intersection logic but was never selected.
2026-05-29 10:30:03 -05:00
Kubernetes Prow Robot
df31649416
Merge pull request #139017 from johnbelamaric/fix-dra-scoring-bug
Fix dra scoring bug with mixed allocated and unallocated claims
2026-05-28 19:08:50 +05:30
Kubernetes Prow Robot
ae84ac1a16
Merge pull request #138274 from wtravO/wtravo/placement-cycle-state
Add PlacementCycleState to WAS scheduler framework
2026-05-28 16:28:48 +05:30
wtravO
b968273f03 Pass PlacementCycleState to PlacementFeasible plugins 2026-05-27 10:44:44 -04:00
wtravO
bd97e3f190 expose PodGroupCycleState via PlacementCycleState 2026-05-27 10:25:23 -04:00
Maciej Skoczeń
8eb66b73ef Add support for PodGroups in scheduling queue 2026-05-27 13:06:13 +00:00
Kubernetes Prow Robot
338e80805f
Merge pull request #138643 from brejman/gang-early-return-core
Add early-return based on mincount to pod group scheduling algorithm
2026-05-27 17:36:43 +05:30
Maciej Wyrzuc
29ddc2907b Wrap errors from pod group preemption 2026-05-26 12:54:46 +00:00
Yuma Ogami
ebc561aaa5 test: update default shape in volumebinding tests to space-spreading
As StorageCapacityScoring graduates to beta, its default shape is
space-spreading (prefer nodes with more available storage capacity).
However, the test code was still treating space-packing (prefer nodes
with less available storage capacity) as the default — a remnant from
the VolumeCapacityPriority era, which was absorbed into
StorageCapacityScoring.

This commit fixes that by aligning the default shape in the tests with
the actual default of StorageCapacityScoring.
2026-05-26 05:25:13 +00:00
Bartosz
c91fee448c
Use PlacementFeasible instead of Permit in PodGroup scheduling cycle 2026-05-25 13:10:42 +00:00
Bartosz
1e1bad1dde
Add PlacementFeasible plugin to support early gang termination 2026-05-25 10:36:03 +00:00
Antoni Basista
8b8aa9c52b Add support for NNN in podgrouppreemption 2026-05-25 08:54:22 +00:00
dom4ha
8a52fb2ea9 Migrate references to v1alpha3 in tests, controllers, and remaining files 2026-05-22 12:50:19 +00:00
dom4ha
43ebd00b66 Migrate internal references from v1alpha2 to v1alpha3 in scheduler and admission plugins 2026-05-22 12:50:19 +00:00
Bartosz
dd1f040ac9
Fix queue hint for anti-affinities 2026-05-20 12:30:41 +00:00
John Belamaric
e54d188fba Fix DRA scoring bug with mixed allocated and unallocated claims 2026-05-15 17:58:03 +00:00
Kubernetes Prow Robot
9f8e03c4d0
Merge pull request #138710 from mm4tt/fix-preemptor-eligibility
scheduler: match preemptor eligibility behavior in pod group preemption
2026-05-15 19:10:37 +05:30
Matt Matejczyk
259b504c3b scheduler: match preemptor eligibility behavior in pod group preemption 2026-05-15 11:34:18 +00:00
Kubernetes Prow Robot
3647aa0125
Merge pull request #138983 from vishalanarase/test/dynamicresources-inflight-signal-idempotence
test(dynamicresources): assert in-flight claim is not replaced on second signal
2026-05-15 14:02:27 +05:30
Kubernetes Prow Robot
3758f08707
Merge pull request #138951 from sujoshua/master
Fix ImageLocality scoring for image volumes
2026-05-15 13:00:28 +05:30
Joshua Su
b96072e372 Fix ImageLocality scoring for image volumes
Include image volumes in the image source count used by calculatePriority, so
the raw image score and max threshold are based on the same image sources.

Update ImageLocality tests to cover ImageVolume scoring against equivalent
regular container images.

Signed-off-by: Joshua Su <i@joshua.su>
2026-05-15 14:29:19 +08:00
Vishal Anarase
90ac6285cd test(dynamicresources): assert in-flight claim is not replaced on second signal
Signed-off-by: Vishal Anarase <iamvishalanarase@gmail.com>
2026-05-13 17:22:28 +05:30
Patrick Ohly
341b7d65b6 DRA: harmonize ResourceClaim creation metric
Both kube-controller-manager and kube-scheduler create ResourceClaims. Using
the same metric (sub-system: "dynamic_resource_allocation", name:
"resourceclaim_creates_total") in both components simplifies aggregation across
the entire cluster.
2026-05-11 12:15:14 +02:00
Antoni Basista
2c5a15d143 Add retrying for Bind API calls. With podGroup scheduling we do not want transient connection interruption to stop binding of whole group of pods 2026-05-07 16:03:05 +00:00
Jarosław Dzikowski
b7edf53ab3 Try to schedule as many pods as possible during workload preemption algorithm 2026-05-07 11:52:33 +00:00
Kubernetes Prow Robot
c485ef21ab
Merge pull request #136709 from gzb1128/dra-cel-no-such-key-error-enhancement
DRA: improve CEL error message for "no such key" errors
2026-04-30 00:21:23 +05:30
Kubernetes Prow Robot
caeae2cfa0
Merge pull request #138522 from bart0sh/PR232-DRA-remove-nil-check
scheduler/dra: remove redundant nil check
2026-04-24 14:36:46 +05:30
Kubernetes Prow Robot
b11a275f31
Merge pull request #138544 from bart0sh/PR233-DRA-findExtendedResourceClaim-fix-nil-pointer-deref
DRA: fix possible nil pointer deref
2026-04-24 13:42:45 +05:30
Ed Bartosh
305b06534e
Update pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
2026-04-23 17:02:12 +03:00
Ed Bartosh
bd08a5d4dc scheduler/dra: fix possible nil pointer dereference
Added nil guard for OwnerReference.Controller to avoid panic
if a ResourceClaim has an OwnerReference with the Controller field
unset.
2026-04-23 14:01:22 +03:00
Jarosław Dzikowski
50f08420d3 Graduate SchedulerQueueingHints feature gate 2026-04-23 10:17:33 +00:00
Kubernetes Prow Robot
37cf8a4753
Merge pull request #138515 from bart0sh/PR231-DRA-extended-resources-check-for-not-found-claim
DRA: tolerate 404 when deleting extended resource claim
2026-04-23 13:20:46 +05:30
Kubernetes Prow Robot
b36864202b
Merge pull request #137755 from HirazawaUi/remove-SidecarContainers-feature-gate
Remove SidecarContainers feature gate
2026-04-23 08:16:45 +05:30
Kubernetes Prow Robot
2f77eec6c8
Merge pull request #138442 from x13n/patch-6
Avoid calling klog.FromContext twice in TaintToleration.Filter()
2026-04-23 07:21:40 +05:30
Kubernetes Prow Robot
737be52413
Merge pull request #137814 from dims/dsrinivas/0008-volumebinding-cache-sync
volumebinding: give binder test caches more time to sync
2026-04-23 04:15:15 +05:30
Ed Bartosh
e6e9fce2c6 scheduler/dra: remove redundant nil check
currentClaimStatus.Resources is initialized as an empty map when the
struct is constructed, so the nil check is a dead code.
2026-04-22 17:12:43 +03:00
Ed Bartosh
c34d0b4a5d DRA: tolerate 404 when deleting extended resource claim
Don't return an error if the claim is already deleted.
2026-04-22 14:15:32 +03:00
Daniel Kłobuszewski
750e654597
Avoid calling klog.FromContext twice in TaintToleration.Filter() 2026-04-17 13:38:45 +02:00
iomarsayed
7a54834917 split pod resource types to help plugins register to only cluster events which they require 2026-04-17 08:29:24 +00:00
gzb1128
5e2d5b9a62 DRA: add hint for CEL "no such key" errors
When CEL expressions access non-existent map keys, add a helpful hint
suggesting optional chaining (.? followed by orValue()) or has() macro.
2026-04-17 11:08:18 +08:00
Kubernetes Prow Robot
da97d71f14
Merge pull request #137897 from nojnhuh/dra-gang
scheduler: fix race in DRA pending allocation sharing
2026-03-24 23:40:18 +05:30
Jon Huhn
61cf993c6b scheduler: fix race in DRA pending allocation sharing 2026-03-24 12:07:31 -05:00