Commit graph

135313 commits

Author SHA1 Message Date
Todd Neal
4ba4211660 fix: normalize nil ports to empty slice in NewPortMapKey
Prevents EndpointSlice churn for headless services where
getEndpointPorts returns [] but existing slices from the API
have nil ports, causing different hash values.
2026-01-27 15:33:51 +00:00
Kubernetes Prow Robot
be658b44f3
Merge pull request #136441 from kannon92/remove-alpha-api-dra
remove alpha comments for GA or beta resource fields
2026-01-27 20:16:00 +05:30
Kubernetes Prow Robot
ab78ad32d1
Merge pull request #136194 from bart0sh/PR216-add-signode-approvers-to-dra-owners
add SIG-Node approvers to DRA dirs
2026-01-27 20:15:52 +05:30
Patrick Ohly
2ec0305d72 client-go informers: replace time.Sleep with callback
While time.Sleep is what the test needs, maybe an arbitrary hook invocation is
more acceptable in the production code because it is more general.
2026-01-27 14:48:32 +01:00
Kubernetes Prow Robot
1087ff613a
Merge pull request #136454 from ania-borowiec/log_illegal_state
Log error when UpdatePod finds no existing PodGroup for the pod
2026-01-27 18:01:50 +05:30
docktofuture
57315c1974 call fake.NewClientset instead of fake.NewSimpleClientset 2026-01-27 13:00:47 +01:00
Ania Borowiec
48c4605408
Add logging error when UpdatePod finds no existing PodGroup with the pod to update 2026-01-27 11:42:03 +00:00
Kubernetes Prow Robot
7cf00c96ac
Merge pull request #136554 from dims/fix-kubeproxy-ipv6-conntrack
test: Fix KubeProxy CLOSE_WAIT test for IPv6 environments (and where /proc/net/nf_conntrack may be missing)
2026-01-27 16:55:49 +05:30
Davanum Srinivas
06bd2191ca
Apparently some EC2 images we use do not have /proc/net/nf_conntrack
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-27 10:29:10 +00:00
Kubernetes Prow Robot
5c9977b892
Merge pull request #136202 from RomanBednar/fix-csi-plugin-backoff
csi: raise kubelet CSI init backoff to cover ~140s DNS delays
2026-01-27 15:47:48 +05:30
docktofuture
eedbe162d1 Fix route controller condition update when external CNI sets NetworkUnavailable 2026-01-27 11:15:03 +01:00
Davanum Srinivas
65c981be7a
test: cleanup from review
- Use netutils.IsIPv6(ip) instead of manual nil/To4 check
- Remove unnecessary ip.To16() call since IPv6 is already 16 bytes
- Remove ipFamily from grep pattern since IP format ensures correctness

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-27 09:31:04 +00:00
Patrick Ohly
65693b2d2a ktesting: fix setting up progress reporting
The recent change to support importing ktesting into an E2E suite
without progress reporting was flawed:
- If a Go unit test had a deadline (the default when invoked
  by `go test`!), the early return skipped initializing progress
  reporting.
- When it didn't, for example when invoking a test binary directly
  under stress, a test created goroutines which were kept running,
  which broke leak checking in e.g. an integration tests TestMain.

The revised approach uses reference counting: as long as some unit test is
running, the progress reporting with the required goroutines are active.
When the last one ends, they get cleaned up, which keeps the goleak
checker happy.
2026-01-27 10:13:43 +01:00
Sascha Grunert
6b6c596a60
Revert GetImageRef to use Image.Id instead of RepoDigests
Partially reverts cb011623c8 from #135369.

Using RepoDigests[0] as image identity causes credential verification
issues because it makes identity location-dependent (registry.io/image@sha256:...)
instead of content-based (sha256:...). This defeats deduplication and
creates separate pull records for identical image content from different
registries.

ImagePulledRecord already handles per-registry credentials via its
two-level design: ImageRef identifies content, CredentialMapping tracks
registry-specific credentials.

Related: #136498, #136549
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2026-01-27 09:41:04 +01:00
Patrick Ohly
d91be59690 ktesting: run time-sensitive unit tests as synctest
The TestCause tests were already unreliable in the CI. The others failed under
stress.

As synctest we have to be more careful how to construct and clean up the parent
context for TestCause (must happen inside bubble), but once that's handled we
can reliably measure the (fake) time and compare exactly against expected
results.
2026-01-27 09:18:10 +01:00
Sascha Grunert
59e3b9137e
Fix image volume subPath test and add feature tag
Change /etc/os-release to /etc/passwd in subPath test to avoid
symlink issues with Alpine 3.21 (kitten:1.8).

Add Feature:ImageVolume tag to properly categorize tests for CI.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2026-01-27 08:43:06 +01:00
Davanum Srinivas
27c59678f0
test: Fix KubeProxy CLOSE_WAIT test for IPv6 environments
The /proc/net/nf_conntrack file uses fully expanded IPv6 addresses
with leading zeros in each 16-bit group. For example:
  fc00:f853:ccd:e793::3 -> fc00:f853:0ccd:e793:0000:0000:0000:0003

Add expandIPv6ForConntrack() helper function to expand IPv6 addresses
to the format used by /proc/net/nf_conntrack before using them in
the grep pattern.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-27 00:35:25 +00:00
Kubernetes Prow Robot
028015267e
Merge pull request #136116 from vinayakankugoyal/ga
KEP:2862 Graduate to STABLE.
2026-01-27 05:09:49 +05:30
Kubernetes Prow Robot
2c9cc8da1a
Merge pull request #135763 from darshansreenivas/admissionregistratio_k8s_io_ValidationAction
feat: wire admissionregistration group for declarative validation and +k8s:required to ValidatingAdmissionPolicyBindingSpec.ValidationActions
2026-01-27 03:23:53 +05:30
Kevin Hannon
da76c98b4d Use Log instead of Logf for job integration where we don't have any varidatic arguments 2026-01-26 16:39:00 -05:00
Kubernetes Prow Robot
c70b61069d
Merge pull request #136465 from cpanato/update-go-1256
Bump images and versions to go 1.25.6 and distroless iptables
2026-01-27 01:23:50 +05:30
Anish Ramasekar
a1478c7730
Drop StructuredAuthorizationConfiguration feature gate
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2026-01-26 11:35:39 -06:00
Carlos Panato
5e54532bf4
Bump images and versions to go 1.25.6 and distroless iptables
Signed-off-by: Carlos Panato <ctadeu@gmail.com>
2026-01-26 18:04:17 +01:00
Kubernetes Prow Robot
efc15394a1
Merge pull request #135573 from brejman/issue-129733-score-update
Update scoring function for balanced allocation to consider change to the node's balance
2026-01-26 21:49:52 +05:30
Ed Bartosh
2f82dc6dce kubelet: DRA: claiminfo: improve logging
- got rid of embedding logger into a struct
- added logging prefix
2026-01-26 17:43:38 +02:00
Ed Bartosh
acff01fe8b kubelet: DRA: healthinfo: set logging prefix 2026-01-26 17:43:38 +02:00
Ed Bartosh
7933d90815 kubelet/dra: get rid of Background calls
Removed almost all remaining context.Background and klog.Background
calls, pass context or logger instead.
2026-01-26 17:43:38 +02:00
Kubernetes Prow Robot
3d544b9142
Merge pull request #136474 from ShaanveerS/netpol-revive-135706
Netpol: reduce e2e pod churn with agnhost porter
2026-01-26 20:59:59 +05:30
Kubernetes Prow Robot
53b29a3a2c
Merge pull request #136269 from pohly/dra-scheduler-double-allocation-fixes
DRA scheduler: double allocation fixes
2026-01-26 20:59:50 +05:30
Patrick Ohly
001ec49eb6 DRA integration: more pods per node, more parallelism
Long running tests like TestDRA/all/DeviceBindingConditions (42.50s)
should run in parallel with other tests, otherwise the overall runtime is too
high.

This then must allow more pods per node to avoid blocking scheduling.
2026-01-26 15:44:49 +01:00
Patrick Ohly
2198d96520 DRA integration: add "uses all resources" test
This corresponds to an E2E test which sometimes (but very rarely) flaked in the
CI.
2026-01-26 15:44:48 +01:00
Patrick Ohly
581ee0a2ec DRA scheduler: fix another root cause of double device allocation
GatherAllocatedState and ListAllAllocatedDevices need to collect information
from different sources (allocated devices, in-flight claims), potentially even
multiple times (GatherAllocatedState first gets allocated devices, then the
capacities).

The underlying assumption that nothing bad happens in parallel is not always
true. The following log snippet shows how an update of the assume
cache (feeding the allocated devices tracker) and in-flight claims lands such
that GatherAllocatedState doesn't see the device in that claim as allocated:

    dra_manager.go:263: I0115 15:11:04.407714      18778] scheduler: Starting GatherAllocatedState
    ...
    allocateddevices.go:189: I0115 15:11:04.407945      18066] scheduler: Observed device allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-094" claim="testdra-all-usesallresources-hvs5d/claim-0553"
    dynamicresources.go:1150: I0115 15:11:04.407981      89109] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680"
    dra_manager.go:201: I0115 15:11:04.408008      89109] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 version="1211"
    dynamicresources.go:1157: I0115 15:11:04.408044      89109] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680" allocation=<
        	{
        	  "devices": {
        	    "results": [
        	      {
        	        "request": "req-1",
        	        "driver": "testdra-all-usesallresources-hvs5d.driver",
        	        "pool": "worker-5",
        	        "device": "worker-5-device-094"
        	      }
        	    ]
        	  },
        	  "nodeSelector": {
        	    "nodeSelectorTerms": [
        	      {
        	        "matchFields": [
        	          {
        	            "key": "metadata.name",
        	            "operator": "In",
        	            "values": [
        	              "worker-5"
        	            ]
        	          }
        	        ]
        	      }
        	    ]
        	  },
        	  "allocationTimestamp": "2026-01-15T14:11:04Z"
        	}
         >
    dra_manager.go:280: I0115 15:11:04.408085      18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-095" claim="testdra-all-usesallresources-hvs5d/claim-0086"
    dra_manager.go:280: I0115 15:11:04.408137      18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-096" claim="testdra-all-usesallresources-hvs5d/claim-0165"
    default_binder.go:69: I0115 15:11:04.408175      89109] scheduler: Attempting to bind pod to node pod="testdra-all-usesallresources-hvs5d/my-pod-0553" node="worker-5"
    dra_manager.go:265: I0115 15:11:04.408264      18778] scheduler: Finished GatherAllocatedState allocatedDevices=<map[string]interface {} | len:2>: {

Initial state: "worker-5-device-094" is in-flight, not in cache
- goroutine #1: starts GatherAllocatedState, copies cache
- goroutine #2: adds to assume cache, removes from in-flight
- goroutine #1: checks in-flight

=> device never seen as allocated

This is the second reason for double allocation of the same device in two
different claims. The other was timing in the assume cache. Both were
tracked down with an integration test (separate commit). It did not fail
all the time, but enough that regressions should show up as flakes.
2026-01-26 15:44:48 +01:00
Kubernetes Prow Robot
584add12b6
Merge pull request #136457 from tosi3k/workload-helper
Extract helper methods from gang scheduling plugin
2026-01-26 20:01:51 +05:30
Bartosz
720d648d2f
Remove outdated test for scoring zero request pods 2026-01-26 14:26:52 +00:00
Bartosz
56ca09911f
Refactor resource allocation tests to be more readable 2026-01-26 14:26:46 +00:00
Bartosz
8f5f69bc70
Change scoring function for balanced allocation 2026-01-26 14:22:46 +00:00
Kubernetes Prow Robot
437184c055
Merge pull request #136292 from atombrella/feature/modernize_plusbuild
Remove obsolete `// +build` instruction.
2026-01-26 19:05:59 +05:30
Kubernetes Prow Robot
ac2ce676c1
Merge pull request #136249 from Yuvraj02/qos-cgroup-cpu-shares-test
kubelet: add unit tests for QoS CPU shares update
2026-01-26 19:05:51 +05:30
Maciej Szulik
80e70c52c0
Run make update
Signed-off-by: Maciej Szulik <soltysh@gmail.com>
2026-01-26 14:26:27 +01:00
Maciej Szulik
a291714883
Generate applyconfigurations for sample-apiserver
Signed-off-by: Maciej Szulik <soltysh@gmail.com>
2026-01-26 14:23:55 +01:00
Maciej Szulik
46082bd565
Generate applyconfigurations for kube-aggregator
Signed-off-by: Maciej Szulik <soltysh@gmail.com>
2026-01-26 14:23:51 +01:00
Tore Lønøy
82a636f1f3 kube-dns bump to v1.26.7 2026-01-26 14:07:55 +01:00
Antoni Zawodny
8b39544d60 Extract helper methods from gang scheduling plugin 2026-01-26 13:45:26 +01:00
Kubernetes Prow Robot
e0cd8e3897
Merge pull request #136529 from dims/fix-kube-proxy-conntrack-test
test: Read /proc/net/nf_conntrack instead of using conntrack binary
2026-01-26 17:35:52 +05:30
Joel Speed
a984ba0bd9
Mark PodGroupPolicy up with openapi union member tags 2026-01-26 11:42:13 +00:00
ShaanveerS
46e9b9e671 feat(kal): enforce optional/required on imagepolicy API group 2026-01-26 12:26:31 +01:00
Davanum Srinivas
7f2c4535c3
test: Read /proc/net/nf_conntrack instead of using conntrack binary
The distroless-iptables image no longer includes the conntrack binary
as of v0.8.7 (removed in kubernetes/release#4223 since kube-proxy no
longer needs it after kubernetes#126847).

Update the KubeProxy CLOSE_WAIT timeout test to read /proc/net/nf_conntrack
directly instead of using the conntrack command. The file contains the
same connection tracking data and is accessible from the privileged
host-network pod.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-26 07:42:07 +00:00
Kubernetes Prow Robot
f4eedc41b8
Merge pull request #136518 from atombrella/feature/modernize_forvar
Remove redundant re-assignments in for-loops under tests
2026-01-26 12:25:08 +05:30
Mads Jensen
757647786d Remove redundant re-assignments in for-loops in test/{e2e,integration,utils}
The modernize forvar rule was applied. There are more details in this blog
post: https://go.dev/blog/loopvar-preview
2026-01-25 22:58:27 +01:00
Patrick Ohly
e6ef79b2f6 client-go informers: fix potential deadlock
In the unlikely situation that sharedProcessor.distribute was triggered by a
resync before sharedProcessor.run had a chance to start the listeners, the
sharedProcessor deadlocked: sharedProcessor.distribute held a read/write lock
on listenersLock while being blocked on the write to the listener's
channel. The listeners who would have read from those weren't get started
because sharedProcessor.run was blocked trying to get a read lock for
listenersLock.

This gets fixed by releasing the read/write lock in sharedProcessor.distribute
while waiting for all listeners to be started. Because either all or no
listeners are started, the existing global listenersStarted boolean is
sufficient.

The TestListenerResyncPeriods tests now runs twice, with and without the
artificial delay. It gets converted to a synctest, so it executes quickly
despite the time.Sleep calls and timing is deterministic. The enhanced log
output confirms that with the delay, the initial sync completes later:

    === RUN   TestListenerResyncPeriods
        shared_informer_test.go:236: 0s: listener3: handle: pod1
        shared_informer_test.go:236: 0s: listener3: handle: pod2
        shared_informer_test.go:236: 0s: listener1: handle: pod1
        shared_informer_test.go:236: 0s: listener1: handle: pod2
        shared_informer_test.go:236: 0s: listener2: handle: pod1
        shared_informer_test.go:236: 0s: listener2: handle: pod2
        shared_informer_test.go:236: 2s: listener2: handle: pod1
        shared_informer_test.go:236: 2s: listener2: handle: pod2
        shared_informer_test.go:236: 3s: listener3: handle: pod1
        shared_informer_test.go:236: 3s: listener3: handle: pod2
    --- PASS: TestListenerResyncPeriods (0.00s)
    === RUN   TestListenerResyncPeriodsDelayed
        shared_informer_test.go:236: 1s: listener1: handle: pod1
        shared_informer_test.go:236: 1s: listener1: handle: pod2
        shared_informer_test.go:236: 1s: listener2: handle: pod1
        shared_informer_test.go:236: 1s: listener2: handle: pod2
        shared_informer_test.go:236: 1s: listener3: handle: pod1
        shared_informer_test.go:236: 1s: listener3: handle: pod2
        shared_informer_test.go:236: 2s: listener2: handle: pod1
        shared_informer_test.go:236: 2s: listener2: handle: pod2
        shared_informer_test.go:236: 3s: listener3: handle: pod1
        shared_informer_test.go:236: 3s: listener3: handle: pod2
    --- PASS: TestListenerResyncPeriodsDelayed (0.00s)
2026-01-25 21:21:10 +01:00