The "create 100 slices" E2E sometimes flaked with timeouts (e.g. 95 out of 100
slices created). It created too much load for an E2E test.
The same test now uses ktesting as API, which makes it possible to run it as
integration test with the original 100 slices and with more moderate 10 slices
as E2E test.
(cherry picked from commit c47ad64820)
Update scheduler performance test examples to use `-benchtime=1x`
instead of `-benchtime=1ns` for explicitly running each benchmark
exactly once. This makes the intent clearer and aligns the examples
with recommended Go benchmark usage.
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Refactoring the DRA upgrade/downgrade testing such that it runs as Go test
depended on supporting ktesting in the E2E framework. That change worked during
presubmit testing, but broke some periodic jobs. Therefore the relevant commits
from https://github.com/kubernetes/kubernetes/pull/135664/commits get reverted:
c47ad64820 DRA e2e+integration: test ResourceSlice controller
047682908d ktesting: replace Begin/End with TContext.Step
de47714879 DRA upgrade/downgrade: rewrite as Go unit test
7c7b1e1018 DRA e2e: make driver deployment possible in Go unit tests
65ef31973c DRA upgrade/downgrade: split out individual test steps
47b613eded e2e framework: support creating TContext
The last one is what must have caused the problem, but the other commits depend
on it.
The "create 100 slices" E2E sometimes flaked with timeouts (e.g. 95 out of 100
slices created). It created too much load for an E2E test.
The same test now uses ktesting as API, which makes it possible to run it as
integration test with the original 100 slices and with more moderate 10 slices
as E2E test.
The original implementation was inspired by how context.Context is handled via
wrapping a parent context. That approach had several issues:
- It is useful to let users call methods (e.g. tCtx.ExpectNoError)
instead of ktesting functions with a tCtx parameters, but that only
worked if all implementations of the interface implemented that
set of methods. This made extending those methods cumbersome (see
the commit which added Require+Assert) and could potentially break
implementations of the interface elsewhere, defeating part of the
motivation for having the interface in the first place.
- It was hard to see how the different TContext wrappers cooperated
with each other.
- Layering injection of "ERROR" and "FATAL ERROR" on top of prefixing
with the klog header caused post-processing of a failed unit test to
remove that line because it looked like log output. Other log output
lines where kept because they were not indented.
- In Go <=1.25, the `go vet sprintf` check only works for functions and
methods if they get called directly and themselves directly pass their
parameters on to fmt.Sprint. The check does not work when calling
methods through an interface. Support for that is coming in Go 1.26,
but will depend on bumping the Go version also in go.mod and thus
may not be immediately possible in Kubernetes.
- Interface documentation in
https://pkg.go.dev/k8s.io/kubernetes@v1.34.2/test/utils/ktesting#TContext
is a monolithic text block. Documentation for methods is more readable and allows
referencing those methods with [] (e.g. [TC.Errorf] works, [TContext.Errorf]
didn't).
The revised implementation is a single struct with (almost) no exported
fields. The two exceptions (embedded context.Context and TB) are useful because
it avoids having to write wrappers for several functions resp. necessary
because Helper cannot be wrapped. Like a logr.LogSink, With* methods can make a
shallow copy and then change some fields in the cloned instance.
The former `ktesting.TContext` interface is now a type alias for
`*ktesting.TC`. This ensures that existing code using ktesting doesn't need to
be updated and because that code is a bit more compact (`tCtx
ktesting.TContext` instead of `tCtx *ktesting.TContext` when not using such an
alias). Hiding that it is a pointer might discourage accessing the exported
fields because it looks like an interface.
Output gets fixed and improved such that:
- "FATAL ERROR" and "ERROR" are at the start of the line, followed by the klog header.
- The failure message follows in the next line.
- Continuation lines are always indented.
The set of methods exposed via TB is now a bit more complete (Attr, Chdir).
All former stand-alone With* functions are now also available as methods and
should be used instead of the functions. Those will be removed.
Linting of log calls now works and found some issues.
Reading numPreFilterCalled races with writing it in the scheduler, at least as
far as the data race detector is concerned. That the test waits for pod
scheduling is too indirect. enqueuePlugin.called has the same problem,
but hasn't triggered the race detector (yet).
We need to protect against concurrent access. The easiest way to enforce that
is via atomic.Int64. In contrast to a mutex it is impossible to use it wrong.
Shutting down the scheduler first was also tried, but didn't work out because
"teardown" does more than just stopping the scheduler, it also cancels a
context that is needed during test shutdown.
This has been replaced by `//build:...` for a long time now.
Removal of the old build tag was automated with:
for i in $(git grep -l '^// +build' | grep -v -e '^vendor/'); do if ! grep -q '^// Code generated' "$i"; then sed -i -e '/^\/\/ +build/d' "$i"; fi; done
The ipallocator was blindly assuming that all errors are retryable, that
causes that the allocator tries to exhaust all the possibilities to
allocate an IP address.
If the error is not retryable this means the allocator will generate as
many API calls as existing available IPs are in the allocator, causing
CPU exhaustion since this requests are coming from inside the apiserver.
In addition to handle the error correctly, this patch also interpret the
error to return the right status code depending on the error type.
Co-authored-by: carlory <baofa.fan@daocloud.io>
This test verifies that when a job with PodReplacementPolicy=Failed is
suspended and pods are terminating, resource updates can be made to the
job but new pods are only created after terminating pods are removed.
The new pods should have the updated resources.
* First version of batching w/out signatures.
* First version of pod signatures.
* Integrate batching with signatures.
* Fix merge conflicts.
* Fixes from self-review.
* Test fixes.
* Fix a bug that limited batches to size 2
Also add some new high-level logging and
simplify the pod affinity signature.
* Re-enable batching on perf tests for now.
* fwk.NewStatus(fwk.Success)
* Review feedback.
* Review feedback.
* Comment fix.
* Two plugin specific unit tests.:
* Add cycle state to the sign call, apply to topo spread.
Also add unit tests for several plugi signature
calls.
* Review feedback.
* Switch to distinct stats for hint and store calls.
* Switch signature from string to []byte
* Revert cyclestate in signs. Update node affinity.
Node affinity now sorts all of the various
nested arrays in the structure. CycleState no
longer in signature; revert to signing fewer
cases for pod spread.
* hack/update-vendor.sh
* Disable signatures when extenders are configured.
* Update pkg/scheduler/framework/runtime/batch.go
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
* Update staging/src/k8s.io/kube-scheduler/framework/interface.go
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
* Review feedback.
* Disable node resource signatures when extended DRA enabled.
* Review feedback.
* Update pkg/scheduler/framework/plugins/imagelocality/image_locality.go
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
* Update pkg/scheduler/framework/interface.go
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
* Update pkg/scheduler/framework/plugins/nodedeclaredfeatures/nodedeclaredfeatures.go
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
* Update pkg/scheduler/framework/runtime/batch.go
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
* Review feedback.
* Fixes for review suggestions.
* Add integration tests.
* Linter fixes, test fix.
* Whitespace fix.
* Remove broken test.
* Unschedulable test.
* Remove go.mod changes.
---------
Co-authored-by: Maciej Skoczeń <87243939+macsko@users.noreply.github.com>
Remove reference to internal types in kuberc types
* Remove unserialized types from public APIs
Also remove defaulting
* Don't do conversion gen for plugin policy types
Because the plugin policy types are explicitly allowed to be empty, they
should not affect conversion. The autogenerated conversion functions for
the `Preference` type will leave those fields empty.
* Remove defaulting tests
Comments and simplifications (h/t jordan liggitt)
Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
- Rescheduling on binding failure using taints or device removal
- Triggering binding timeout when BindingConditions remain unmet
- Recovery to a device without BindingConditions after binding timeout
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>