Commit graph

29158 commits

Author SHA1 Message Date
ShaanveerS
d800e7e052 test/images: bump agnhost to v2.61 2026-01-18 12:57:37 +01:00
Kubernetes Prow Robot
9bfe52e1fe
Merge pull request #136191 from saschagrunert/psi-memory-pressure-test
Skip memory pressure PSI test for CRI-O
2026-01-17 09:27:15 +05:30
Kubernetes Prow Robot
751ab64d57
Merge pull request #135837 from dgrisonnet/increase-resource-limits
test/e2e: fix pod resize test flakes on CRI-O/runc environments
2026-01-17 08:35:15 +05:30
Kubernetes Prow Robot
49f5ecc02c
Merge pull request #135874 from mochizuki875/make_general_profile_default
kubectl debug: make general profile default
2026-01-17 02:31:16 +05:30
Kubernetes Prow Robot
8de4a11252
Merge pull request #136156 from pohly/dra-upgrade-downgrade-refactor-2
DRA: upgrade/downgrade refactor, II
2026-01-16 23:31:15 +05:30
Kubernetes Prow Robot
08764697f4
Merge pull request #135381 from kannon92/mutable-pod-replacement-policy
[KEP-5440]: Add integration test for MutablePodResourcesForSuspendedJobs with Pod Replacement Policy = Failed
2026-01-16 19:29:16 +05:30
Sascha Grunert
19e1f9cce2
Skip memory pressure PSI test for CRI-O
Skip the memory pressure PSI test when running with CRI-O until automatic
memory.high configuration is available in the runtime. The test fails on
Fedora CoreOS due to different page cache reclaim behavior, and CRI-O is
implementing a fix to automatically set memory.high to 95% of memory.max
for cgroup v2 containers.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2026-01-16 09:03:01 +01:00
Patrick Ohly
1847d5b1a2 DRA e2e+integration: test ResourceSlice controller
The "create 100 slices" E2E sometimes flaked with timeouts (e.g. 95 out of 100
slices created). It created too much load for an E2E test.

The same test now uses ktesting as API, which makes it possible to run it as
integration test with the original 100 slices and with more moderate 10 slices
as E2E test.

(cherry picked from commit c47ad64820)
2026-01-16 08:10:37 +01:00
Patrick Ohly
11dcfc6c15 ktesting: replace Begin/End with TContext.Step
Manually pairing Being with End is too error prone to be useful. It had the
advantage of keeping variables created between them visible to the following
code, but that doesn't justify using those calls.

By using a callback we can achieve a few things:

- Code using it automatically shadows the parent tCtx, thus enforcing
  that within a code block the tCtx with step is used consistently.
- The code block is clearly delineated with curly braces.
- When the code block ends, the unmodified parent tCtx is automatically
  in scope again.

Downsides:

- Extra boilerplate for the anonymous function.
  Python's `with tCtx.Step(...) as tCtx: ` would be nicer.
  As an approximation of that `for tCtx := range tCtx.Step(...)` was
  tried with `Step` returning an iterator, but that wasn't very idiomatic.
- Variables created inside the code block are not visible outside of it.

(cherry picked from commit 047682908d)
2026-01-16 08:10:36 +01:00
Patrick Ohly
d44d0281eb DRA upgrade/downgrade: rewrite as Go unit test
tCtx.Run and sub-tests make it much simpler to separate the different steps
than with Ginkgo because unless a test runs tCtx.Parallel (which we don't do
here), everything runs sequentially in a deterministic order.

Right now we get:

    ...
        localupcluster.go:285: I1210 12:24:22.067524] bring up v1.34: stopping kubelet
        localupcluster.go:285: I1210 12:24:22.067548] bring up v1.34: stopping kube-scheduler
        localupcluster.go:285: I1210 12:24:22.067570] bring up v1.34: stopping kube-controller-manager
        localupcluster.go:285: I1210 12:24:22.067589] bring up v1.34: stopping kube-apiserver
    --- PASS: TestUpgradeDowngrade (94.78s)
        --- PASS: TestUpgradeDowngrade/after-cluster-creation (2.07s)
            --- PASS: TestUpgradeDowngrade/after-cluster-creation/core_DRA (2.05s)
            --- PASS: TestUpgradeDowngrade/after-cluster-creation/ResourceClaim_device_status (0.02s)
        --- PASS: TestUpgradeDowngrade/after-cluster-upgrade (4.10s)
            --- PASS: TestUpgradeDowngrade/after-cluster-upgrade/core_DRA (4.09s)
            --- PASS: TestUpgradeDowngrade/after-cluster-upgrade/ResourceClaim_device_status (0.01s)
        --- PASS: TestUpgradeDowngrade/after-cluster-downgrade (1.24s)
            --- PASS: TestUpgradeDowngrade/after-cluster-downgrade/core_DRA (1.21s)
            --- PASS: TestUpgradeDowngrade/after-cluster-downgrade/ResourceClaim_device_status (0.02s)
    PASS

It's even possible to use `-failfast` and
e.g. `-run=TestUpgradeDowngrade/after-cluster-creation/core_DRA`: `go test` then
runs everything up to that sub-test or any failing sub-test, then stops and
cleans up.

(cherry picked from commit de47714879)
2026-01-16 07:54:51 +01:00
Patrick Ohly
06d52b7702 CSI: revert introduction of context with cancellation
The traditional behavior of PodIO was to ignore the context. Changing that to
use the canceled context was risky because maybe some cleanup operation after
cancellation of the context wouldn't run anymore when it previously did.

However, this is theoretical. Tests all seemed to pass fine even without this
change.
2026-01-16 07:53:00 +01:00
Patrick Ohly
4a3d822689 DRA e2e: make driver deployment possible in Go unit tests
This leverages ktesting as wrapper around Ginkgo and testing.T to make all
helper code that is needed to deploy a DRA driver available to Go unit
tests and thus integration tests.

How to proceed with unifying helper code for integration and E2E testing is
open. This is just a minimal first step in that direction. Ideally, such
code should be in separate packages where usage of Ginkgo, e2e/framework
and gomega.Expect/Eventually/Consistently are forbidden.

While at it, the builder gets extended to make cleanup optional.
This will be needed for upgrade/downgrade testing with sub-tests.

(cherry picked from commit 7c7b1e1018)
2026-01-16 07:53:00 +01:00
Patrick Ohly
db36339d03 e2e framework: avoid memory overhead of ginkgo.GinkgoT
It turned out that ginkgo.GinkgoT() wasn't as cheap as it should have been (fix
coming in Ginkgo 2.27.5). When instantiated once for each framework.Framework
instance during init by all workers at the same time, the resulting spike in
overall memory usage within the container caused OOM killing of workers in Prow
jobs like ci-kubernetes-e2e-gci-gce with very tight memory limits.

Even with the upcoming fix in Ginkgo it makes sense to set the TB field only
while it really is needed, i.e. while a test runs. This is conceptually similar
to setting and unsetting the test namespace. It may help to flush out incorrect
usage of TB outside of tests.
2026-01-16 07:53:00 +01:00
Patrick Ohly
0d64cbff49 e2e framework: support creating TContext
This makes it possible to call helper packages which expect a TContext from E2E
tests.

The implementation uses GinkgoT as TB and supports registering cleanup
callbacks which expect a context. These callbacks then run with a context that
comes from ginkgo.DeferCleanup, just as if they had called that directly.

(cherry picked from commit 47b613eded)
2026-01-16 07:53:00 +01:00
Patrick Ohly
4864f45cc3 DRA upgrade/downgrade: split out individual test steps
This approach with collecting results from callbacks in a main ginkgo.It and
using them as failures in separate ginkgo.It callbacks might be the best that
can be done with Ginkgo.

A better solution is probably Go unit tests with sub-tests.

(cherry picked from commit 65ef31973c)
2026-01-16 07:52:55 +01:00
Patrick Ohly
7421eea877 ktesting: install signal handler on demand
The ktesting package is meant to be usable in E2E suites and then must not
affect signal handling in Ginkgo.
2026-01-16 07:51:29 +01:00
Kubernetes Prow Robot
e777bba0b2
Merge pull request #136243 from dims/mounttest-tmpfs-detection
Add tmpfs detection to mounttest and unit tests
2026-01-16 07:07:11 +05:30
Benjamin Elder
a60d114402 emeritus logicalhan, rest in peace
https://github.com/cncf/memorials/blob/main/han-kang.md
2026-01-15 12:18:02 -08:00
Kubernetes Prow Robot
1dde6f3475
Merge pull request #135584 from pohly/dra-upgrade-downgrade-tests
DRA testing: upgrade/downgrade preparation for 1.35
2026-01-15 22:41:40 +05:30
Davanum Srinivas
d3d78ce0dd
Add tmpfs detection to mounttest and unit tests
Refactor fsType() to use a platform-specific formatFsType() helper that
translates filesystem magic numbers to human-readable names. On Linux,
tmpfs filesystems now display as "tmpfs" instead of the raw magic number.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-15 12:02:56 -05:00
Kubernetes Prow Robot
0ba578f91f
Merge pull request #135393 from tosi3k/parallel-prebind
Run PreBind plugins in parallel
2026-01-15 12:39:34 +05:30
Kubernetes Prow Robot
e79df94fc8
Merge pull request #136064 from coillteoir/master
testing: reintegrate volume image e2e test
2026-01-15 07:27:34 +05:30
Kubernetes Prow Robot
8322d26d1f
Merge pull request #135462 from michaelasp/atomicReplace
Add atomic replace in client-go
2026-01-15 03:35:37 +05:30
Michael Aspinwall
9e25c19199 Add AtomicFIFO feature gate 2026-01-14 19:00:32 +00:00
Kubernetes Prow Robot
616fff8247
Merge pull request #131317 from bitoku/fix-static-init
Fix:Static pod status is always Init:0/1 if unable to get init container status
2026-01-15 00:27:38 +05:30
Kubernetes Prow Robot
c086a712b1
Merge pull request #136197 from bart0sh/PR217-use-benchtime-1x-in-scheduler_perf-doc
scheduler_perf: use -benchtime=1x in the test examples
2026-01-14 19:01:38 +05:30
David Lynch
df66e4728b Back out "Remove image volume e2e test because CI has containerd < 2.1"
Original commit changeset: 71ddb98ae4

user: David Lynch <davite3@protonmail.com>
2026-01-14 12:36:03 +00:00
Ed Bartosh
d966d9b89d scheduler_perf: use -benchtime=1x in the test examples
Update scheduler performance test examples to use `-benchtime=1x`
instead of `-benchtime=1ns` for explicitly running each benchmark
exactly once. This makes the intent clearer and aligns the examples
with recommended Go benchmark usage.

Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
2026-01-14 11:07:32 +02:00
Kubernetes Prow Robot
62277ef5d2
Merge pull request #136199 from saschagrunert/master
Fix credential test by setting `AlwaysVerify` policy
2026-01-14 08:49:43 +05:30
Sascha Grunert
1abe2c4860
Fix credential test by setting AlwaysVerify policy
The test expects unauthorized pods to be blocked from accessing cached
private images, but the default policy (NeverVerifyPreloadedImages)
allows access to any image previously pulled by the kubelet.

Configure the kubelet to use AlwaysVerify policy for this test, which
enforces credential checks for all images regardless of pull history.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2026-01-13 13:52:43 +01:00
Kubernetes Prow Robot
9f6977db54
Merge pull request #136086 from richabanker/graduate-watch_list_duration_seconds-BETA
Graduate watch_list_duration_seconds to BETA
2026-01-13 15:29:37 +05:30
Kubernetes Prow Robot
d68d48073f
Merge pull request #136112 from danwinship/network-1.36-cleanup
Drop TopologyAwareHints and ServiceTraficDistribution feature gates
2026-01-13 07:43:36 +05:30
Anish Ramasekar
900a8030f3
test: fix kind local registry config for kms ci jobs
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2026-01-12 12:59:14 -08:00
Kubernetes Prow Robot
210881f0f0
Merge pull request #135485 from saschagrunert/fix-device-plugin-termination-grace-period
Fix device plugin admission failure after container restart
2026-01-12 22:24:22 +05:30
Kubernetes Prow Robot
af6c58193c
Merge pull request #135369 from saschagrunert/serial-tests
test: Fix image credential pulls test node scheduling
2026-01-12 22:24:13 +05:30
Kubernetes Prow Robot
b89a81cfcb
Merge pull request #136167 from Karthik-K-N/update-funcs
DRA: remove deprecated test method usage, fix linter hints
2026-01-12 18:04:39 +05:30
Sascha Grunert
172a65c71d
Fix device plugin admission failure after container restart
When a container restarts before kubelet restarts, containerMap has
multiple entries (old exited + new running). GetContainerID() may
return the exited container, causing the running check to fail. Fixed
by checking if ANY container for the pod/name is running.

Also filter terminal pods from podresources since they no longer
consume resources, and fix test error handling to avoid exiting
Eventually immediately on transient errors.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2026-01-12 11:55:25 +01:00
Kubernetes Prow Robot
6df7e09ad9
Merge pull request #135911 from ShaanveerS/add-udp-agnhost
Add UDP and explicit TCP support to agnhost porter
2026-01-12 16:16:06 +05:30
Karthik Bhat
8962f08815 Remove deprecated test methods 2026-01-12 16:15:04 +05:30
Antoni Zawodny
833b7205fc Run PreBind plugins in parallel if feasible 2026-01-11 14:19:18 +01:00
Patrick Ohly
e999d595b1 testing: partial revert of E2E + DRA upgrade/downgrade
Refactoring the DRA upgrade/downgrade testing such that it runs as Go test
depended on supporting ktesting in the E2E framework. That change worked during
presubmit testing, but broke some periodic jobs. Therefore the relevant commits
from https://github.com/kubernetes/kubernetes/pull/135664/commits get reverted:

c47ad64820 DRA e2e+integration: test ResourceSlice controller
047682908d ktesting: replace Begin/End with TContext.Step
de47714879 DRA upgrade/downgrade: rewrite as Go unit test
7c7b1e1018 DRA e2e: make driver deployment possible in Go unit tests
65ef31973c DRA upgrade/downgrade: split out individual test steps
47b613eded e2e framework: support creating TContext

The last one is what must have caused the problem, but the other commits depend
on it.
2026-01-11 09:55:17 +01:00
ShaanveerS
d1867b4864 agnhost porter add support UDP and SERVE_TCP_PORT 2026-01-10 05:47:25 +01:00
Kubernetes Prow Robot
5151096d1f
Merge pull request #136096 from pacoxu/patch-14
Update error message expectation in criproxy_test
2026-01-10 05:47:47 +05:30
Kubernetes Prow Robot
c71eec3c3f
Merge pull request #135687 from yashsingh74/cni-bump
Update CNI plugins to v1.9.0
2026-01-10 04:57:41 +05:30
Richa Banker
8d2838a53e Graduate watch_list_duration_seconds to BETA 2026-01-09 13:37:20 -08:00
Kubernetes Prow Robot
da22735138
Merge pull request #136041 from richabanker/update-metrics-docs-1.35
Update metrics docs 1.35
2026-01-10 02:13:41 +05:30
Dan Winship
f278b47ecd Drop TopologyAwareHints and ServiceTraficDistribution feature gates 2026-01-09 12:42:34 -05:00
Kubernetes Prow Robot
98e6935d43
Merge pull request #136140 from pohly/e2e-framework-tcontext-fix
E2E framework: fix nil pointer crash in TContext
2026-01-09 23:09:41 +05:30
Kubernetes Prow Robot
c68de67df3
Merge pull request #136132 from pohly/ktesting-default-verbosity
ktesting: avoid increasing default verbosity
2026-01-09 22:17:50 +05:30
Patrick Ohly
80cc14831e E2E framework: fix nil pointer crash in TContext
Not all framework instances have a default namespace. TContext
crashed for those.
2026-01-09 16:23:11 +01:00