kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-03-23 19:04:33 -04:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	f86edc2665	Merge pull request #133929 from huww98/fix-pv-cache-race-v2 scheduler/volumebinding: passive assume cache	2025-10-02 08:42:56 -07:00
Kubernetes Prow Robot	4a1558c545	Merge pull request #133967 from pohly/dra-allocator-selection DRA: allocator selection	2025-09-30 08:24:18 -07:00
Patrick Ohly	60eeaa6ebd	DRA scheduler: add unit test for allocator selection This prevents the mistake from 1.34 where the default-on DRAResourceClaimDeviceStatus feature caused the use of the experimental allocator implementation. The test fails without a fix for that.	2025-09-30 16:53:38 +02:00
Patrick Ohly	7f57730ba4	DRA scheduler: fix selection of "incubating" allocator implementation In 1.34, the default feature gate selection picked the "experimental" allocator implementation when it should have used the "incubating" allocator. No harm came from that because the experimental allocator has all the necessary if checks to disable the extra code and no bugs were introduced when implementing it, but it means that our safety net wasn't there when we expected it to be. The reason is that the "DRAResourceClaimDeviceStatus" feature gate is on by default and was only listed as supported by the experimental implementation. This could be fixed by listing it as supported also by the other implementation, but that would be a bit odd because there is nothing to support for it (the reason why this was missed in 1.34!). Instead, the allocator features are now only indirectly related to feature gates, with a single boolean controlling the implementation of binding conditions.	2025-09-30 16:53:38 +02:00
Patrick Ohly	b5bcac998d	DRA scheduler: clean up feature gate handling Copying from feature.Features to new fields in the plugin got a bit silly with the long list of features that we have now. Embedding feature.Features is simpler. Two fields in feature.Features weren't named according to the feature gate, now they are named consistently and the fields are sorted.	2025-09-30 16:53:38 +02:00
hojinchoi	7028ba09db	fix: duplicated 'the' in comment	2025-09-18 18:11:44 +09:00
胡玮文	e39ed4a4b4	scheduler/volumebinding: add test for PVAssumeCache	2025-09-13 19:23:03 +08:00
胡玮文	bbee7b9d6b	scheduler/volumebinding: rename passive_assume_cache_test.go	2025-09-13 13:16:39 +08:00
胡玮文	4b0eff59c0	scheduler/volumebinding: target AssumeCache UT generic passiveAssumeCache And removing duplicate tests	2025-09-13 13:15:24 +08:00
胡玮文	5a708a7ff0	scheduler/volumebinding: remove Get[API]{PV,PVC} should be replaced by generic Get[APIObj]	2025-09-13 00:26:46 +08:00
胡玮文	ed19492dc2	scheduler/volumebinding: passive assume cache Currently volume and dynamic-resource plugin shares an AssumeCache implementation. However, they features significantly different use case. DRA call Assume() on objects returned by APIServer, but volume call Assume() on objects yet to be sent to the APIServer. VolumeBinding plugin only makes one update request, while DynamicResource makes 2 requests (add finalizer then update allocation status). Taking advantage of this, currently the volume cache is simpler: 1. Reserve: assume PV/PVC will be updated 2. PreBind: really send the update request 3. AssumeCache receives an update from informer and overwrite the assumed state. a. if Prebind succeeded, this will surely include the update from step 2. b. if PreBind is not finished yet, and this is an irrelevant update, it is safe to overwrite the assumed state, because our update in PreBind will surely fail with Conflict. While for DynamicResource: 1. Reserve: add devices to inFlightAllocations 2. PreBind: a. send the 2 update requests b. add the returned object into AssumeCache c. AssumeCache dispatch events synchronously to update allocatedDevices d. remove devices from inFlightAllocations DynamicResource needs some features from AssumeCache that is not necessary for VolumeBinding: 1. DynamicResource needs strictly ordered update events to update allocatedDevices, including those from Assume() and Restore() 2. DynamicResource needs to compare ResourceVersion to prevent the assumed state from being overwritten by older version from informer. While this works, the doc[1] says: "you must not compare resource versions for greater-than or less-than relationships". Given so much difference, it can be beneficial to fork another simpler AssumeCache for VolumeBinding plugin. Because of no need to send events, the lite AssumeCache is a passive component. It only record the assumed version without copying all objects from informer into its local cache. When reading, we read from both informer and local cache. So it will always be up-to-date with informer, no need to wait for event handler. This resolves a race condition where AssumeCache and scheduler queue both receive events from informer. When a pod is scheduling due to PV update event, the PVCache may be not updated yet because it has not processed the relevant event. The passive version still listens events from informer, but only for cleaning up its local cache to save memory. [1]: https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-versions	2025-09-13 00:26:45 +08:00
胡玮文	c385a229d4	scheduler/volumebinding: introduce testInformer	2025-09-12 15:11:52 +08:00
胡玮文	eaf87d5907	scheduler/volumebinding: pass testing.T to helper	2025-09-12 14:29:02 +08:00
胡玮文	dce23dac03	scheduler/volumebinding: use subtest	2025-09-12 14:23:06 +08:00
Kubernetes Prow Robot	d602326b87	Merge pull request #133363 from yliaog/implicit Allow implicit extended resource name to be used no matter explicit extendedResourceName field is set or not in device class	2025-09-11 13:40:07 -07:00
yliao	74cf1db218	sort the device requests in the extended resource claim spec. removed the sortClaim in the unit test.	2025-09-11 16:55:58 +00:00
yliao	79f8d1b1c5	fixed bug such that implicit extended resource name can always be used, no matter the explicit extendedResourceName field in device class is set or not.	2025-09-10 14:10:40 +00:00
Maciej Skoczeń	3dfcda9afd	Fix minor inconsistencies in scheduler	2025-09-10 11:40:10 +00:00
Ania Borowiec	fadb40199f	Move interfaces: Handle and Plugin and related types from kubernetes/kubernetes to staging repo kube-scheduler	2025-09-02 09:42:53 +00:00
Kubernetes Prow Robot	b94b6ece10	Merge pull request #133707 from ania-borowiec/fitsports Remove use of pkg/scheduler/framework.NodeInfo in node_ports.go	2025-08-31 19:25:11 -07:00
Kubernetes Prow Robot	871857b0d0	Merge pull request #133608 from yliaog/flake added resourceClaimModified to bindClaim to decide whether to update assume cache	2025-08-29 15:23:08 -07:00
yliao	bf13cd1b81	added resourceClaimModified to bindClaim to decide whether to update assume cache	2025-08-29 16:12:55 +00:00
Ania Borowiec	b012e16b47	Remove use of pkg/scheduler/framework.NodeInfo in node_ports.go	2025-08-27 13:30:45 +00:00
Ania Borowiec	3c00c3cb29	Move GetAffinityTerms functions from pkg/scheduler/framework to staging repo	2025-08-26 13:39:49 +00:00
Abu Kashem	747a295cac	fix flake in dra test 'TestPlugin' TestPlugin/multi-claims-binding-conditions-all-success/PreEnqueue flakes due to the assumed cache not been synced with the initial store. The test waits until the registered handler used by the assumed cache has synced before proceeding with the test	2025-08-18 15:54:03 -04:00
Abu Kashem	c8ab780edb	dra plugin: assume claim after api call in bindClaim	2025-08-13 16:35:35 -04:00
yliao	2a026f6d65	1/ added retries to AssumeClaimAfterAPICall for the object which is not present in the cache (dynamicresources.go) 2/ modified the assume cache verification to not error out as long as the expected claim is in the cache, no matter its latest and api object are different or not. (dynamicresources_test.go). 3/ fixed nil panic as seen from https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/133321/pull-kubernetes-integration/1952472629470302208	2025-08-06 07:08:58 +00:00
yliao	0a12f00e9d	fix nil panic in hasBindingConditions, it cannot assume claim has allocations	2025-07-30 14:44:41 +09:00
Sunyanan Choochotkaew	7f052afaef	KEP 5075: implement scheduler Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>	2025-07-30 09:52:49 +09:00
Sunyanan Choochotkaew	5ad969588d	KEP-5075: API updates Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>	2025-07-30 09:26:40 +09:00
yliao	34a64db2c7	extended resource backed by DRA: implementation	2025-07-29 18:55:21 +00:00
Kubernetes Prow Robot	e2ab840708	Merge pull request #130160 from KobayashiD27/dra-device-binding-conditions Implement DRA Device Binding Conditions (KEP-5007)	2025-07-29 07:34:26 -07:00
Kobayashi,Daisuke	e8c3af1f5c	KEP-5007 DRA Device Binding Conditions: Implement scheduler logic	2025-07-29 11:34:30 +00:00
Maciej Skoczeń	17d733e243	KEP-5229: Send API calls through dispatcher and cache	2025-07-25 15:35:36 +00:00
Kubernetes Prow Robot	a11bc701e8	Merge pull request #132457 from ania-borowiec/depends_on_cluster_move_podinfo Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler	2025-07-24 09:38:27 -07:00
Ania Borowiec	aecd37e6fb	Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler	2025-07-24 12:10:58 +00:00
Kubernetes Prow Robot	89a01ec72a	Merge pull request #133019 from pohly/dra-scheduler-plugin-owners DRA scheduler plugin: add pohly as approver	2025-07-24 03:42:33 -07:00
Patrick Ohly	5c4f81743c	DRA: use v1 API As before when adding v1beta2, DRA drivers built using the k8s.io/dynamic-resource-allocation helper packages remain compatible with all Kubernetes release >= 1.32. The helper code picks whatever API version is enabled from v1beta1/v1beta2/v1. However, the control plane now depends on v1, so a cluster configuration where only v1beta1 or v1beta2 are enabled without the v1 won't work.	2025-07-24 08:33:45 +02:00
Ed Bartosh	c2a06e7912	DRA: skip flaky test case on Windows Added a skipOnWindows flag to DynamicResources scheduler test case to skip test that relies on nanosecond timer precision. Windows timer granularity is much coarser than Linux, which causes the test to fail often.	2025-07-23 11:06:11 +03:00
Kensei Nakada	4b8dd9612f	cleanup: remove example plugins	2025-07-19 13:08:34 +09:00
Patrick Ohly	bc338e7505	DRA scheduler: implement filter timeout and cancellation The intent is to catch abnormal runtimes with the generously large default timeout of 10 seconds. We have to set up a context with the configured timeout (optional!), then ensure that both CEL evaluation and the allocation logic itself properly returns the context error. The scheduler plugin then can convert that into "unschedulable". The allocator and thus Filter now also check for context cancellation by the scheduler. This happens when enough nodes have been found.	2025-07-17 21:18:28 +02:00
Patrick Ohly	025c606e39	DRA scheduler: add plugin configuration The only option is the filter timeout. The implementation of it follows in a separate commit.	2025-07-17 16:47:47 +02:00
Patrick Ohly	ee38a00131	DRA scheduler: add DRASchedulerFilterTimeout feature gate Initializing the scheduler Features struct will be needed in different places, therefore NewSchedulerFeaturesFromGates gets introduced. Besides, having it next to the struct makes it easier to add new features. The DRASchedulerFilterTimeout feature gate simplifies disabling the timeout because setting a feature gate is often easier than modifying the scheduler configuration with a zero timeout value. The timeout and feature gate are new. The gate starts as beta and enabled by default, which is consistent with the "smaller changes with low enough risk that still may need to be disabled..." guideline.	2025-07-17 16:47:47 +02:00
Patrick Ohly	a2a3839a8e	DRA scheduler: add pohly as approver This is meant for simple changes, like code cleanup or API changes of the allocator code. For more complex changes and new features, SIG Scheduling approvers will be required to approve, as before.	2025-07-17 09:43:44 +02:00
yliao	dd3691b169	refactor allocator, removed claimsToAllocate from NewAllocator(), instead, passed it through Allocate()	2025-07-16 15:11:11 +00:00
Kubernetes Prow Robot	ab685237f0	Merge pull request #132391 from sanposhiho/pre-bind-pre-flight feat: add PreBindPreFlight and implement in in-tree plugins	2025-07-15 04:06:23 -07:00
Kubernetes Prow Robot	e3b20c07d6	Merge pull request #132870 from pohly/dra-allocator DRA: refactor claim allocator	2025-07-15 01:28:29 -07:00
Patrick Ohly	5caf7bca15	DRA allocator: refactor code The goal is to maintain different version of the allocator logic. We already had one incidence where adding an alpha feature caused a regression also when it was disabled. Not everything can be implemented within obviously correct if branches. This also opens the door for implementing different alternatives. The code just gets moved around for now.	2025-07-10 17:34:21 +02:00
Pawel Mechlinski	f2b24b9849	Increase verbosity of frequently printed loglines in binder plugin	2025-07-09 12:10:10 +00:00
Kensei Nakada	ebae419337	feat: add PreBindPreFlight and implement in in-tree plugins	2025-07-05 17:14:21 -07:00

1 2 3 4 5 ...

1373 commits