Commit graph

19 commits

Author SHA1 Message Date
Davanum Srinivas
0934916b90
test/e2e/node: explain v12.5 pin for cuda-samples on arm64
Document why cuda-samples is pinned to v12.5 rather than the latest
tag: it has to match the CUDA 12.5 toolkit in the base image and the
cuda-demo-suite-12-5 apt package used on x86_64. v13+ cuda-samples
also requires CUDA Toolkit 13.x and switched from make to CMake, so
bumping is a coordinated change across base image, apt package, git
tag, and build commands.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-04-20 07:07:50 -04:00
Davanum Srinivas
6db917c42e
Update test/e2e/node/gpu.go
Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>
2026-04-20 07:00:54 -04:00
Davanum Srinivas
ad41961d32
test/e2e/node: make GPU sanity test work on arm64 (sbsa)
The [Feature:GPUDevicePlugin] Sanity test embeds
`apt-get install -y cuda-demo-suite-12-5` under `set -e`. NVIDIA's CUDA
apt repo publishes cuda-demo-suite-* for x86_64 but NOT for sbsa
(confirmed against the public Packages index on
developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/{sbsa,x86_64}/).
On arm64 the install fails, the container exits 1, pod.Status.Phase
becomes Failed, and the subsequent `gomega.Expect(... .Equal(Succeeded))`
assertion trips.

Split the demo phase on architecture. On x86_64 keep the existing apt
path unchanged. On anything else, build deviceQuery / vectorAdd /
bandwidthTest from the public NVIDIA/cuda-samples repo instead.
busGrind is exclusive to cuda-demo-suite (no source equivalent in
cuda-samples) and is skipped on non-x86_64.

The pattern is the one already in production use by
sigs.k8s.io/dra-driver-nvidia-gpu in tests/bats/specs/gpu-cuda-demo-suite.yaml,
which has been green on Lambda gpu_1x_gh200.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-04-19 18:55:08 -04:00
Jiefeng Xu
b738ae6d97 test/e2e/node: handle quick pod completion in GPU startup wait 2026-03-01 11:50:57 -08:00
Jiefeng Xu
6e203664eb test/e2e/node: reduce flakiness in GPU nvidia-smi test 2026-02-08 22:40:45 -08:00
carlory
5e54df3e72 Fix [Failing test] [sig-node] [Feature:GPUDevicePlugin] [Serial]-related tests 2025-06-19 15:32:23 +08:00
Jack Francis
53499d97ee prefer error over bool, prefer Should(gomega.Succeed())
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2025-03-13 16:15:24 -07:00
Jack Francis
d54ff7441e test: don't panic during an Eventually retry loop
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2025-03-13 13:57:55 -07:00
Davanum Srinivas
d3cbe2fe86
Re-add nvidia-gpu-device-plugin.yaml in test suite itself
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-27 14:23:57 -04:00
Davanum Srinivas
472ca3b279
skip control plane nodes, they may not have GPUs
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-24 10:09:33 -04:00
Davanum Srinivas
349c7136c9
Wait for GPUs even for AWS kubetest2 ec2 harness
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-24 09:11:18 -04:00
Davanum Srinivas
1abbb00067
Double a couple of other timeouts
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-22 19:36:39 -04:00
Davanum Srinivas
92683139d7
Skip re-installation of GPU daemonset
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-22 13:54:12 -04:00
Davanum Srinivas
3d7d06e7cd
Bump timeout to account for slow GPU operations
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-20 20:52:51 -04:00
Davanum Srinivas
e516e003c5
Test MOAR GPU stuff
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-20 11:40:33 -04:00
Davanum Srinivas
3ec74e0984
split into 3 distinct tests
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-19 16:15:53 -04:00
Davanum Srinivas
630abc6929
Resurrect GPU tests that use Jobs
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-19 11:30:49 -04:00
Davanum Srinivas
08a8cf7865
Install Nvidia Daemonset in test harness for GCE
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-18 19:58:17 -04:00
Davanum Srinivas
71bdcab2ad
Add some simple tests for nvidia GPU(s)
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-17 16:18:00 -04:00