Commit graph

1373 commits

Author SHA1 Message Date
Kubernetes Prow Robot
f86edc2665
Merge pull request #133929 from huww98/fix-pv-cache-race-v2
scheduler/volumebinding: passive assume cache
2025-10-02 08:42:56 -07:00
Kubernetes Prow Robot
4a1558c545
Merge pull request #133967 from pohly/dra-allocator-selection
DRA: allocator selection
2025-09-30 08:24:18 -07:00
Patrick Ohly
60eeaa6ebd DRA scheduler: add unit test for allocator selection
This prevents the mistake from 1.34 where the default-on
DRAResourceClaimDeviceStatus feature caused the use of the experimental
allocator implementation. The test fails without a fix for that.
2025-09-30 16:53:38 +02:00
Patrick Ohly
7f57730ba4 DRA scheduler: fix selection of "incubating" allocator implementation
In 1.34, the default feature gate selection picked the "experimental" allocator
implementation when it should have used the "incubating" allocator. No harm
came from that because the experimental allocator has all the necessary if
checks to disable the extra code and no bugs were introduced when implementing
it, but it means that our safety net wasn't there when we expected it to be.

The reason is that the "DRAResourceClaimDeviceStatus" feature gate is on by
default and was only listed as supported by the experimental implementation.
This could be fixed by listing it as supported also by the other
implementation, but that would be a bit odd because there is nothing to support
for it (the reason why this was missed in 1.34!). Instead, the allocator
features are now only indirectly related to feature gates, with a single
boolean controlling the implementation of binding conditions.
2025-09-30 16:53:38 +02:00
Patrick Ohly
b5bcac998d DRA scheduler: clean up feature gate handling
Copying from feature.Features to new fields in the plugin got a bit silly with
the long list of features that we have now. Embedding feature.Features is
simpler.

Two fields in feature.Features weren't named according to the feature gate, now
they are named consistently and the fields are sorted.
2025-09-30 16:53:38 +02:00
hojinchoi
7028ba09db fix: duplicated 'the' in comment 2025-09-18 18:11:44 +09:00
胡玮文
e39ed4a4b4 scheduler/volumebinding: add test for PVAssumeCache 2025-09-13 19:23:03 +08:00
胡玮文
bbee7b9d6b scheduler/volumebinding: rename passive_assume_cache_test.go 2025-09-13 13:16:39 +08:00
胡玮文
4b0eff59c0 scheduler/volumebinding: target AssumeCache UT generic passiveAssumeCache
And removing duplicate tests
2025-09-13 13:15:24 +08:00
胡玮文
5a708a7ff0 scheduler/volumebinding: remove Get[API]{PV,PVC}
should be replaced by generic Get[APIObj]
2025-09-13 00:26:46 +08:00
胡玮文
ed19492dc2 scheduler/volumebinding: passive assume cache
Currently volume and dynamic-resource plugin shares an AssumeCache
implementation. However, they features significantly different use case.  DRA
call Assume() on objects returned by APIServer, but volume call Assume() on
objects yet to be sent to the APIServer.

VolumeBinding plugin only makes one update request, while DynamicResource makes
2 requests (add finalizer then update allocation status).  Taking advantage of
this, currently the volume cache is simpler:

1. Reserve: assume PV/PVC will be updated
2. PreBind: really send the update request
3. AssumeCache receives an update from informer and overwrite the assumed state.
   a. if Prebind succeeded, this will surely include the update from step 2.
   b. if PreBind is not finished yet, and this is an irrelevant update, it is safe to
      overwrite the assumed state, because our update in PreBind will surely fail with Conflict.

While for DynamicResource:

1. Reserve: add devices to inFlightAllocations
2. PreBind:
   a. send the 2 update requests
   b. add the returned object into AssumeCache
   c. AssumeCache dispatch events synchronously to update allocatedDevices
   d. remove devices from inFlightAllocations

DynamicResource needs some features from AssumeCache that is not necessary for VolumeBinding:
1. DynamicResource needs strictly ordered update events to update allocatedDevices,
   including those from Assume() and Restore()
2. DynamicResource needs to compare ResourceVersion to prevent the assumed state from being
   overwritten by older version from informer.  While this works, the doc[1] says:
   "you must not compare resource versions for greater-than or less-than relationships".

Given so much difference, it can be beneficial to fork another simpler
AssumeCache for VolumeBinding plugin. Because of no need to send events, the
lite AssumeCache is a passive component. It only record the assumed version
without copying all objects from informer into its local cache. When reading,
we read from both informer and local cache.  So it will always be up-to-date
with informer, no need to wait for event handler.

This resolves a race condition where AssumeCache and scheduler queue both
receive events from informer. When a pod is scheduling due to PV update event,
the PVCache may be not updated yet because it has not processed the relevant
event.

The passive version still listens events from informer, but only for cleaning up
its local cache to save memory.

[1]: https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-versions
2025-09-13 00:26:45 +08:00
胡玮文
c385a229d4 scheduler/volumebinding: introduce testInformer 2025-09-12 15:11:52 +08:00
胡玮文
eaf87d5907 scheduler/volumebinding: pass testing.T to helper 2025-09-12 14:29:02 +08:00
胡玮文
dce23dac03 scheduler/volumebinding: use subtest 2025-09-12 14:23:06 +08:00
Kubernetes Prow Robot
d602326b87
Merge pull request #133363 from yliaog/implicit
Allow implicit extended resource name to be used no matter explicit extendedResourceName field is set or not in device class
2025-09-11 13:40:07 -07:00
yliao
74cf1db218 sort the device requests in the extended resource claim spec.
removed the sortClaim in the unit test.
2025-09-11 16:55:58 +00:00
yliao
79f8d1b1c5 fixed bug such that implicit extended resource name can always be used,
no matter the explicit extendedResourceName field in device class is set or not.
2025-09-10 14:10:40 +00:00
Maciej Skoczeń
3dfcda9afd Fix minor inconsistencies in scheduler 2025-09-10 11:40:10 +00:00
Ania Borowiec
fadb40199f
Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler 2025-09-02 09:42:53 +00:00
Kubernetes Prow Robot
b94b6ece10
Merge pull request #133707 from ania-borowiec/fitsports
Remove use of pkg/scheduler/framework.NodeInfo in node_ports.go
2025-08-31 19:25:11 -07:00
Kubernetes Prow Robot
871857b0d0
Merge pull request #133608 from yliaog/flake
added resourceClaimModified to bindClaim to decide whether to update assume cache
2025-08-29 15:23:08 -07:00
yliao
bf13cd1b81 added resourceClaimModified to bindClaim to decide whether to update assume cache 2025-08-29 16:12:55 +00:00
Ania Borowiec
b012e16b47
Remove use of pkg/scheduler/framework.NodeInfo in node_ports.go 2025-08-27 13:30:45 +00:00
Ania Borowiec
3c00c3cb29
Move GetAffinityTerms functions from pkg/scheduler/framework to staging repo 2025-08-26 13:39:49 +00:00
Abu Kashem
747a295cac
fix flake in dra test 'TestPlugin'
TestPlugin/multi-claims-binding-conditions-all-success/PreEnqueue
flakes due to the assumed cache not been synced with the initial
store. The test waits until the registered handler used by the
assumed cache has synced before proceeding with the test
2025-08-18 15:54:03 -04:00
Abu Kashem
c8ab780edb
dra plugin: assume claim after api call in bindClaim 2025-08-13 16:35:35 -04:00
yliao
2a026f6d65 1/ added retries to AssumeClaimAfterAPICall for the object which is not present in the cache (dynamicresources.go)
2/ modified the assume cache verification to not error out as long as
the expected claim is in the cache, no matter its latest and api object
are different or not. (dynamicresources_test.go).
3/ fixed nil panic as seen from https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/133321/pull-kubernetes-integration/1952472629470302208
2025-08-06 07:08:58 +00:00
yliao
0a12f00e9d
fix nil panic in hasBindingConditions, it cannot assume claim has allocations 2025-07-30 14:44:41 +09:00
Sunyanan Choochotkaew
7f052afaef
KEP 5075: implement scheduler
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
2025-07-30 09:52:49 +09:00
Sunyanan Choochotkaew
5ad969588d
KEP-5075: API updates
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
2025-07-30 09:26:40 +09:00
yliao
34a64db2c7 extended resource backed by DRA: implementation 2025-07-29 18:55:21 +00:00
Kubernetes Prow Robot
e2ab840708
Merge pull request #130160 from KobayashiD27/dra-device-binding-conditions
Implement DRA Device Binding Conditions (KEP-5007)
2025-07-29 07:34:26 -07:00
Kobayashi,Daisuke
e8c3af1f5c KEP-5007 DRA Device Binding Conditions: Implement scheduler logic 2025-07-29 11:34:30 +00:00
Maciej Skoczeń
17d733e243 KEP-5229: Send API calls through dispatcher and cache 2025-07-25 15:35:36 +00:00
Kubernetes Prow Robot
a11bc701e8
Merge pull request #132457 from ania-borowiec/depends_on_cluster_move_podinfo
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
2025-07-24 09:38:27 -07:00
Ania Borowiec
aecd37e6fb
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler 2025-07-24 12:10:58 +00:00
Kubernetes Prow Robot
89a01ec72a
Merge pull request #133019 from pohly/dra-scheduler-plugin-owners
DRA scheduler plugin: add pohly as approver
2025-07-24 03:42:33 -07:00
Patrick Ohly
5c4f81743c DRA: use v1 API
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.

However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
2025-07-24 08:33:45 +02:00
Ed Bartosh
c2a06e7912 DRA: skip flaky test case on Windows
Added a skipOnWindows flag to DynamicResources scheduler test case
to skip test that relies on nanosecond timer precision.
Windows timer granularity is much coarser than Linux, which causes
the test to fail often.
2025-07-23 11:06:11 +03:00
Kensei Nakada
4b8dd9612f cleanup: remove example plugins 2025-07-19 13:08:34 +09:00
Patrick Ohly
bc338e7505 DRA scheduler: implement filter timeout and cancellation
The intent is to catch abnormal runtimes with the generously large default
timeout of 10 seconds.

We have to set up a context with the configured timeout (optional!), then
ensure that both CEL evaluation and the allocation logic itself properly
returns the context error. The scheduler plugin then can convert that into
"unschedulable".

The allocator and thus Filter now also check for context cancellation by the
scheduler. This happens when enough nodes have been found.
2025-07-17 21:18:28 +02:00
Patrick Ohly
025c606e39 DRA scheduler: add plugin configuration
The only option is the filter timeout.
The implementation of it follows in a separate commit.
2025-07-17 16:47:47 +02:00
Patrick Ohly
ee38a00131 DRA scheduler: add DRASchedulerFilterTimeout feature gate
Initializing the scheduler Features struct will be needed in different places,
therefore NewSchedulerFeaturesFromGates gets introduced. Besides, having it
next to the struct makes it easier to add new features.

The DRASchedulerFilterTimeout feature gate simplifies disabling the timeout
because setting a feature gate is often easier than modifying the scheduler
configuration with a zero timeout value.

The timeout and feature gate are new. The gate starts as beta and enabled by
default, which is consistent with the "smaller changes with low enough risk
that still may need to be disabled..." guideline.
2025-07-17 16:47:47 +02:00
Patrick Ohly
a2a3839a8e DRA scheduler: add pohly as approver
This is meant for simple changes, like code cleanup or API changes of the
allocator code. For more complex changes and new features, SIG Scheduling
approvers will be required to approve, as before.
2025-07-17 09:43:44 +02:00
yliao
dd3691b169 refactor allocator, removed claimsToAllocate from NewAllocator(), instead, passed it through Allocate() 2025-07-16 15:11:11 +00:00
Kubernetes Prow Robot
ab685237f0
Merge pull request #132391 from sanposhiho/pre-bind-pre-flight
feat: add PreBindPreFlight and implement in in-tree plugins
2025-07-15 04:06:23 -07:00
Kubernetes Prow Robot
e3b20c07d6
Merge pull request #132870 from pohly/dra-allocator
DRA: refactor claim allocator
2025-07-15 01:28:29 -07:00
Patrick Ohly
5caf7bca15 DRA allocator: refactor code
The goal is to maintain different version of the allocator logic. We already
had one incidence where adding an alpha feature caused a regression also when
it was disabled. Not everything can be implemented within obviously correct if
branches.

This also opens the door for implementing different alternatives.

The code just gets moved around for now.
2025-07-10 17:34:21 +02:00
Pawel Mechlinski
f2b24b9849 Increase verbosity of frequently printed loglines in binder plugin 2025-07-09 12:10:10 +00:00
Kensei Nakada
ebae419337 feat: add PreBindPreFlight and implement in in-tree plugins 2025-07-05 17:14:21 -07:00