Commit graph

3572 commits

Author SHA1 Message Date
kubernetes-prow[bot]
11ea3c2a46
Merge pull request #139896 from pohly/e2e-node-artifacts-default
E2E node: consider ARTIFACTS env variable for results dir
2026-06-21 13:53:36 +00:00
Patrick Ohly
8eed364890 E2E node: bump to -v4 for remote testing
The hard-coded verbosity in `make test-e2e-node` is 4
(17e2eda611/hack/make-rules/test-e2e-node.sh (L248)).
Pre-pending -v4 emulates that behavior, with the difference that an explicit
-v passed by the caller (typically kubetest2) could be used to override it.
2026-06-21 12:43:07 +02:00
Patrick Ohly
fcc5d50fde E2E node: consider ARTIFACTS env variable for results dir
`make test-e2e-node` sets the -results-dir based on the ARTIFACTS Prow job env
variable. When e2e_node.test gets invoked directly, it should do the same,
otherwise JUnit and log files are not captured for the job.
2026-06-21 11:57:09 +02:00
Kubernetes Prow Robot
17e2eda611
Merge pull request #139745 from ngopalak-redhat/ngopalak/fix_is_xfs
Ensure is_xfs evaluates to true in quota tests
2026-06-20 14:19:38 +05:30
Francesco Romani
f0b952f2a1
e2e: node: consolidate more createPodSync calls
fix the missing instance which escaped the fix in
f7bd739f22

Signed-off-by: Francesco Romani <fromani@redhat.com>
2026-06-18 17:20:33 +02:00
Neeraj Krishna Gopalakrishna
0a36086243 Ensure is_xfs evaluates to true in quota tests
Signed-off-by: Neeraj Krishna Gopalakrishna <ngopalak@redhat.com>
2026-06-18 09:39:38 +05:30
Kubernetes Prow Robot
9d6e94a40d
Merge pull request #139741 from bart0sh/PR241-kubelet-podresources-utils-fix-contextual-todos
kubelet/podresources, kubelet/util/manager: propagate logger/context
2026-06-15 22:25:35 +05:30
Ed Bartosh
80e8baa8b8 kubelet/podresources: pass context to GetV1*Client
Replace context.TODO() with a context parameter passed by callers.
2026-06-15 13:13:51 +03:00
Kubernetes Prow Robot
57110af20d
Merge pull request #129079 from Tal-or/smtalignment_error
staticpolicy:smtalign: count for pre-allocated cpus for container
2026-06-15 14:27:23 +05:30
Talor Itzhak
e8e3fb93ee e2e:node: consider pre-allocated CPUs
This test verifies that pods with pre-allocated CPUs (from the checkpoint file)
are not rejected after kubelet restart when SMT alignment is enabled.
Regression test for the fix where the container presence check was moved
before the SMT alignment check.

The key is to request enough CPUs so that if pre-allocated CPUs are not
counted, the SMT alignment check would fail due to insufficient available
physical CPUs.

Calculate the maximum SMT-aligned CPUs we can request
We need to request most of the allocatable CPUs to trigger the bug.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2026-06-14 12:17:32 +03:00
Kubernetes Prow Robot
79751b17da
Merge pull request #137278 from humblec/update-npd-v1.35.2
Update node-problem-detector to v1.35.2 and remove addon manifests
2026-06-11 20:26:42 +05:30
Kubernetes Prow Robot
3841ba06c2
Merge pull request #139530 from QiWang19/cleanuppod-grace-period
Set short termination grace period for test pods in MemoryQoS tests
2026-06-11 08:04:49 +05:30
Humble Devassy Chirammal
05033bc8ca Update node-problem-detector to v1.35.2 and remove addon manifests
Update node-problem-detector from v1.34.0 to v1.35.2 and remove all
related addon manifests and install logic that is no longer needed:

- Update version in build/dependencies.yaml, test/e2e_node/image_list.go
  and test/kubemark/resources/hollow-node_template.yaml.
- Remove cluster/addons/node-problem-detector/ entirely. No e2e tests
  depend on these manifests: e2e_node tests create NPD pods inline and
  GCE standalone mode runs NPD as a systemd service.
- Remove install-node-problem-detector function and DEFAULT_NPD_* vars
  from cluster/gce/gci/configure.sh along with the conditional that
  invoked it, since NPD is no longer installed as a standalone binary
  via this script.
- Remove the setup-addon-manifests calls for node-problem-detector from
  cluster/gce/gci/configure-helper.sh since the source directory no
  longer exists.
- Remove stale refPaths in build/dependencies.yaml that pointed to the
  deleted addon files.

Signed-off-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
2026-06-10 14:04:57 +05:30
Qi Wang
82e38acb67 Set short termination grace period for test pods in MemoryQoS tests 2026-06-09 13:38:34 -04:00
Sergey Kanzhelev
d74b5907d5 builder pattern in cri client 2026-06-09 09:24:06 -07:00
HirazawaUi
e79d1a4271 Fix flaking e2e_node tests 2026-06-07 15:11:57 +08:00
Kubernetes Prow Robot
a0afe51e25
Merge pull request #139129 from pohly/e2e-node-update-local
E2E node: enable using release archives for periodic jobs, simplified
2026-06-03 22:09:47 +05:30
Patrick Ohly
de2d13b27e e2e_node: support pre-built binaries
This is not usable through "make test-e2e-node", which (while feasible) would
be a bit pointless because the Kubernetes source could would still be needed
for the make rules.

Instead, "kubetest2 noop -test=node" gets extended to invoke `e2e_node.test
remote` with flags that tell e2e_node.test where to find the binaries and
flags that were provided by the caller of kubetest2.
2026-06-03 10:32:48 +02:00
Patrick Ohly
2d574790a6 e2e_node: fix log output
fmt.Printf lacked the trailing newline and is inconsistent with other output,
which uses klog.
2026-06-03 08:34:56 +02:00
Patrick Ohly
6ba4d21765 e2e_node: multiplex different commands in e2e_node.test
The additional commands (mounter, gcp-credentials-provider) are needed for E2E
node testing. This change makes e2e_node.test entirely self-contained.

Copying the commands' code into separate packages is temporary and only done to
avoid touching them while it is still unclear whether this approach will work
out.

Besides avoiding changes to the build rules, bundling the functionality also has a
slight size advantage: the size of e2e_node.test increases by 10KB, whereas
the other two separate commands would add 10MB.
2026-06-03 08:34:56 +02:00
Patrick Ohly
071c858417 e2e_node: invoke make once for all targets
The caller does not need to enable or disable CGO explicitly, the build rules
do that automatically:

    $ make WHAT="cmd/kubelet cluster/gce/gci/mounter"
    +++ [0515 17:02:56] Building go targets for linux/amd64
        k8s.io/kubernetes/cluster/gce/gci/mounter (static)
        k8s.io/kubernetes/cmd/kubelet (non-static)

BuildGo builds the same targets as before. BuildTargets gets changed
to accept a list of targets from the caller, which is a more useful
package API.
2026-06-03 08:34:56 +02:00
Kubernetes Prow Robot
3c47d576e5
Merge pull request #137620 from Karthik-K-N/remove-hardcoded-volpath
test: Replace hardcoded kubelet volume paths with TestContext.KubeletRootDir in node e2e tests
2026-06-03 08:05:43 +05:30
Kubernetes Prow Robot
ec15ec6d09
Merge pull request #139377 from sohankunkerkar/fix-memoryqos-high-rollback
kubelet: clear stale memory.high on containers when MemoryQoS is disabled
2026-06-01 21:34:50 +05:30
Francesco Romani
9d9fd50e15 node: e2e: remove tests referring disable CPU quota
The DisableCPUQuotaWithExclusiveCPUs FG is now locked to true,
so we can remove all the tests referring to it.
Some of them were backward compatibility tests - no longer
needed if the FG is locked;
some other tests explicitly set the FG to true - no longer
needed either as the default is true and can't be changed anymore.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2026-06-01 09:51:23 +02:00
Davanum Srinivas
bf37c18d74
e2e: node: cpumanager: don't set the locked DisableCPUQuotaWithExclusiveCPUs gate to false
DisableCPUQuotaWithExclusiveCPUs is locked to its default (true) since v1.37, so any KubeletConfiguration that sets it to false is rejected and crash-loops the kubelet at startup. configureCPUManagerInKubelet wrote the gate unconditionally and the field defaults to false, so every CPU Manager test that reconfigured the kubelet hit it. Only set the gate when true, and skip the "CFS quota can be disabled" block that exercised the false path.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-05-30 21:47:39 -04:00
Sohan Kunkerkar
0e5d54a29a kubelet: clear stale memory.high on containers when MemoryQoS is disabled
Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
2026-05-29 23:46:50 -04:00
hoteye
9c85877613 kubelet: pass logger into container ID parsing
Pass a logger into ParseContainerID instead of creating a klog.TODO inside the helper. This lets kubelet, prober, and node e2e call sites use their available contextual logger when container ID parsing fails.
2026-05-27 10:08:32 +08:00
Kubernetes Prow Robot
3fa9f8f97d
Merge pull request #139183 from hoteye/hoteye-util-boottime-context
kubelet: thread logger through boot time lookup
2026-05-23 17:08:42 +05:30
Kubernetes Prow Robot
31646c4d02
Merge pull request #139121 from carlory/update-kubelet-removal-1.38
kubelet: defer CRI fallback removal to 1.38
2026-05-22 23:10:59 +05:30
Kubernetes Prow Robot
ec8eaa5789
Merge pull request #139178 from sohankunkerkar/add-memory-events-metrics-test
Add memory.events metrics to container metrics test
2026-05-21 10:16:50 +05:30
Sohan Kunkerkar
f1cd17ea97 Add memory.events metrics to container metrics test
Verify container_memory_events_high_total and
container_memory_events_max_total are reported by cadvisor.
These counters were added in cadvisor v0.57.0 to expose
cgroup v2 memory.events for MemoryQoS observability.

KEP: https://github.com/kubernetes/enhancements/issues/2570
Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
2026-05-20 16:19:08 -04:00
Kubernetes Prow Robot
c9faa15c83
Merge pull request #139184 from pohly/e2e-node-flag-cleanup
e2e_node: avoid polluting e2e_node command line with helper packages
2026-05-20 23:54:53 +05:30
Patrick Ohly
62cfe57459 e2e_node: avoid polluting e2e_node command line with helper packages
e2e_node.test depends on test/e2e_node/builder and test/e2e_node/remote because
test/e2e_node/services/ uses some small helper functions from those two
packages. But e2e_node.test itself never builds any Go binaries, nor does it
run remote testing - that functionality is provided by the separate
test/e2e_node/runner commands.

Therefore these two packages should not put their command line flags into
flag.CommandLine because then they show up in the command line of e2e_node test
unnecessarily.

This change removes the following flags from the e2e_node.test command line:

    diff -r before/e2e_node after/e2e_node
    7,8d6
    <       --build-only                                                 If true, build e2e_node_test.tar.gz and exit.
    <       --cleanup                                                    If true remove files from remote hosts and delete temporary instances (default true)
    20d17
    <       --delete-instances                                           If true, delete any instances created (default true)
    42d38
    <       --ginkgo-flags string                                        Passed to ginkgo to specify additional flags such as --skip=.
    95d90
    <       --gubernator                                                 If true, output Gubernator link to view logs
    97d91
    <       --hosts string                                               hosts to test
    99,100d92
    <       --image-config-dir string                                    (optional) path to image config files
    <       --image-config-file string                                   yaml file describing images to run
    103d94
    <       --images string                                              images to test
    105,106d95
    <       --instance-name-prefix string                                prefix for instance names
    <       --k8s-bin-dir string                                         Directory containing k8s kubelet binaries.
    120d108
    <       --mode string                                                Mode to operate in. One of gce|ssh. Defaults to gce (default "gce")
    133d120
    <       --results-dir string                                         Directory to scp test results to. (default "/tmp/")
    142,145d128
    <       --ssh-env string                                             Use predefined ssh options for environment.  Options: gce
    <       --ssh-key string                                             Path to ssh private key.
    <       --ssh-options string                                         Commandline options passed to ssh.
    <       --ssh-user string                                            Use predefined user for ssh.
    160,161d142
    <       --target-build-arch string                                   Target architecture for the test artifacts for dockerized build (default "linux/amd64")
    <       --test-timeout duration                                      How long (in golang duration format) to wait for ginkgo tests to complete. (default 45m0s)
    196d176
    <       --test_args string                                           Space-separated list of arguments to pass to Ginkgo test runner.
    198d177
    <       --use-dockerized-build                                       Use dockerized build for test artifacts
2026-05-20 12:00:01 +02:00
hoteye
4d24257a5e kubelet: thread logger through boot time lookup
Pass a logger into GetBootTime so the Linux fallback path no longer creates a local context.TODO() only to derive a logger.

This keeps boot time lookup behavior unchanged and updates the node startup latency tracker constructor to accept a logger instead of a context, matching contextual logging migration guidelines.
2026-05-20 15:22:00 +08:00
carlory
f4d97c13f5 kubelet: defer CRI fallback removal to 1.38 2026-05-18 09:52:38 +08:00
Kubernetes Prow Robot
97d2d4a29f
Merge pull request #139073 from sohankunkerkar/fix/memoryqos-rollback-startup-cleanup
Use updateKubeletConfig helper in rollback tests
2026-05-15 23:24:36 +05:30
Kubernetes Prow Robot
908fa4852b
Merge pull request #139033 from saschagrunert/fix/container-metrics-direct-io
Use direct I/O for ContainerMetrics cadvisor test
2026-05-15 23:24:29 +05:30
Sohan Kunkerkar
7bef6a3ab1 Use updateKubeletConfig helper in rollback tests
Address review feedback to use the standard updateKubeletConfig helper
instead of manual WriteKubeletConfigFile + restartKubelet + waitForKubeletToStart.
2026-05-14 16:33:41 -04:00
Kubernetes Prow Robot
4f39ba34ff
Merge pull request #138903 from sohankunkerkar/fix/memoryqos-rollback-startup-cleanup
Clear stale MemoryQoS cgroup values at kubelet startup
2026-05-15 00:30:28 +05:30
Sascha Grunert
5843e8ce1a
Use direct I/O for ContainerMetrics cadvisor test
Overlayfs does not support cgroupv2 writeback accounting, so buffered
writes (even with conv=fsync) get attributed to the root cgroup instead
of the container's cgroup. This causes cadvisor to see an empty io.stat
for the container, making container_blkio_device_usage_total,
container_fs_reads_bytes_total, and container_fs_writes_bytes_total
permanently absent.

Switch to oflag=direct for writes and add iflag=direct reads to bypass
the page cache entirely. Direct I/O is always attributed to the issuing
process's cgroup regardless of filesystem type.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2026-05-13 14:36:29 +02:00
Kubernetes Prow Robot
874a7b40b0
Merge pull request #138617 from esotsal/kubeletHealthCheckRefactor
Move kubeletHealthCheck from e2enode to node as HealthCheck
2026-05-12 02:26:10 +05:30
Kubernetes Prow Robot
5cf56a97d5
Merge pull request #138851 from saschagrunert/fix/container-metrics-flake
Fix ContainerMetrics cadvisor test flake for block I/O metrics
2026-05-10 18:37:47 +05:30
Sotiris Salloumis
20c57876a4 Increase bound CPU limit to 2e+10 to fix admission api flaky test.
After replacing the command to increase UsageNanoCores, to fix a previous flaky test,
in some test environments, UsageNanoCores exceeds the limit 2e+09, this commit
attempts to fix this by ncreasing UsageNanoCores limit to 2e+10.
2026-05-09 09:46:23 +02:00
Sohan Kunkerkar
85d3992ac1 Clear stale MemoryQoS cgroup values at kubelet startup
When MemoryQoS is disabled after being previously enabled, stale
memory.min and memory.low values persist on QoS-class cgroups because
systemd re-applies stored properties on every SetUnitProperties call.

Fix this by including memory.min=0 and memory.low=0 in the existing
startup dbus calls (enforceNodeAllocatableCgroups for the root cgroup,
qosContainerManager.Start for the burstable cgroup). This overwrites
systemd's stored stale values so subsequent realizations re-apply 0.

Fixes https://github.com/kubernetes/kubernetes/issues/138436
KEP: https://github.com/kubernetes/enhancements/issues/2570

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
2026-05-08 13:14:59 -04:00
Kubernetes Prow Robot
4818833ecc
Merge pull request #138820 from esotsal/fix-sriov-cpumanager
Fix podresources flaky test: wait for Pod Resources V1 serving in flaky test
2026-05-08 00:05:18 +05:30
Sascha Grunert
ee9f8c6bde
Fix ContainerMetrics cadvisor test flakes
Replace the small echo write with a dd that uses conv=fsync to force
data through the block layer. Without fsync, the 11-byte echo writes
stay in page cache and never reach the block device within the
60-second test window. This leaves the cgroup io.stat empty, so
cadvisor does not emit container_blkio_device_usage_total,
container_fs_reads_bytes_total, or container_fs_writes_bytes_total
for the container.

The conv=fsync call guarantees block device I/O on every loop
iteration. Once io.stat has an entry for a device, all fields
(rbytes, wbytes, rios, wios) are present, even if zero, so all
cadvisor metrics pass their boundedSample(0, ...) checks.

Also increase the UsageCoreNanoSeconds upper bound from 1e11 to 1e12
for the container and pod-level CPU checks. The cumulative CPU time
can exceed 100s on slower architectures like ppc64le where the dd
CPU burner loop accumulates faster than expected.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2026-05-07 15:01:02 +02:00
Kubernetes Prow Robot
d92b8fe8f2
Merge pull request #138739 from zxqlxy/device-plugin-slow-register
Add e2e test for device plugin slow register
2026-05-07 11:42:31 +05:30
Sotiris Salloumis
acabaa7d50 Fix podresources flaky test: wait for Pod Resources V1 serving in flaky test
One podresources test, was not waiting for Pod Resources V1 to be serving.
This can lead to flaky tests in a next step.

This change attempts to fix this flaky test, by adding waitForPodResourcesV1Serving(ctx)
as done on remaining tests. In addition ExpectNoError was added to all closing connection
attempts, to improve troubleshooting.
2026-05-07 05:35:17 +02:00
Xinyun Liu
62e23b9857 Add E2E test for multiple device plugin and second one is struggle to register 2026-05-06 23:48:32 +00:00
Paco Xu
11d08fcb7f
Revert "remove flaky label in SRIOV related tests" 2026-05-06 17:11:33 +08:00