Commit graph

240 commits

Author SHA1 Message Date
Brad Davidson
b3962bd057 Fix restart of control-plane-only nodes attempting to reconcile from local datastore
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-30 18:39:02 -08:00
Brad Davidson
d38b4b30cd Replace temporary etcd server with raw mvcc store access
Fixes an issue where copying files out from under a currently-running etcd instance can cause startup reconcile to fail. Direct creation of a mvcc store without any of the raft stuff is faster, and gives us direct control over how the store handles snapshot recovery.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-05 09:59:29 -08:00
Brad Davidson
fc506e56dd lint: unnecessary-format,use-errors-new
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
91a41d8c30 lint: unnecessary-stmt
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
62d2737faa lint: unchecked-type-assertion
Adds a generic wrapper around lru.Cache

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
26b4f21479 lint: indent-error-flow
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
f279a979b3 lint: exported
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
7c7e442be0 lint: empty-lines
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
100cb633a3 lint: duplicated-imports
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
23093122b0 lint: defer,get-return
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
316464975e lint: redundant-build-tag
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
a6c6cd15c0 Fix panic in test cleanup when client is unset
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 07:22:48 -08:00
Brad Davidson
c3ca02aa75 Move embed into separate package from executor
Better isolates the K3s implementation from the interface, and aligns
the package path with other projects executors. This should also remove
the indirect flannel dep from other projects that don't use the embedded
executor.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
713cf8fbde Use patch helper for etcd labels and annotations
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
2b39b6808a Use patch helper for etcd member controller
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
785cfad963 Use patch helper for etcd snapshot annotation patch
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
b7ca944774 Move etcd metrics to separate package
Allows importing pkg/metrics without pulling in pkg/etcd, which was causing an import loop in a follow-up commit.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
f0d54528d0 Stop waiting on CRI ready if context is cancelled
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-11-21 09:25:00 -08:00
Brad Davidson
7146e2000e Fix apiserver starting before remote etcd is up
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Has been cancelled
Fixes issue where the apiserver on control-plane-only nodes does not
actually wait for a connection to etcd to be available before starting.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-11-07 10:32:02 -08:00
Brad Davidson
171644cf0c Replace raw ListWatch with NewListWatchFromClient
NewListWatchFromClient replaces a bunch of boilerplate, and is also context-aware

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-10-27 15:06:45 -07:00
Brad Davidson
bfdcc7bcc8 Fix etcd member promotion
The `continue` was incorrectly changed to `return` when converting the
loop to an inline function in 4974fc7c24

Also addresses unnecessary creation of a new kubernetes client every
time the promotion check runs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-18 16:31:15 -07:00
Brad Davidson
4974fc7c24 Use sync.WaitGroup to avoid exiting before components have shut down
Currently only waits on etcd and kine, as other components
are stateless and do not need to shut down cleanly.

Terminal but non-fatal errors now request shutdown via context
cancellation, instead of just logging a fatal error.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-17 09:37:08 -07:00
Brad Davidson
b61d6f3b81 Transfer cluster leadership before removing leader member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-17 09:37:08 -07:00
Brad Davidson
659f2a7014 Fix perpetual etcd member removal
Fixes issue where member removal would be requeud until the node was deleted, or rejoined with a new name.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-17 09:37:08 -07:00
Brad Davidson
f3a036a9b1
Bump kine for compact_rev_key watch fix
Fix apiserver-managed compact, and enable it

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-05 00:04:41 +00:00
Derek Nola
9314d84714
Bump grpc and update resolver
Signed-off-by: Derek Nola <derek.nola@suse.com>
2025-09-04 17:33:42 -06:00
Derek Nola
56ef1cd3a2
Update etcd to v3.6.4-k3s3
* Raft is now an independent dependency, with a seperate release version
* errors moved into their own subpackage
* set a default WarningUnaryRequestDuration

Signed-off-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Michael Fritch <mfritch@suse.com>
2025-09-04 17:33:10 -06:00
Brad Davidson
f1c82392d0 Fix etcd join timeout handling
Error is deadline exceeded, not cancelled

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-08-27 13:41:54 -07:00
Brad Davidson
a9016f3dcb Add retry on etcd MemberAdd timeout
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-08-26 09:35:48 -07:00
Vitor Savian
a238f33cdd
Add retention flag specific for s3
* Add retention flag specific for s3
* Add retention for the unit tests:

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2025-07-28 13:42:09 -03:00
Brad Davidson
3a428ff02c Update metric help to be more descriptive.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-07-17 11:24:17 -07:00
Brad Davidson
5ce3db779d Update kine and use config defaults helper
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-07-11 10:10:13 -07:00
Brad Davidson
5cc51edafa Fix sqlite-etcd migration
Forgot to add new config to temporary kine in #12293

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-06-12 17:17:49 -07:00
Brad Davidson
db5390511e Switch from endpoints to endpointslices
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-06-09 11:28:02 -07:00
Brad Davidson
a8f0acbe52 Add CLI flag and config file for s3 bucket lookup type
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-05-07 11:50:22 -07:00
Brad Davidson
f90334e207 Fix etcd socket option config
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-11 13:39:44 -07:00
Brad Davidson
9deef77eef Add ReusePort/ReuseAddr flags to etcd config
Addresses flakes in etcd CI due to the port still being in TIME_WAIT after the server is shut down between tests

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-08 15:27:19 -07:00
Brad Davidson
a897f6875e Fix flakey etcd startup tests
Increase etcd shutdown delay to avoid "bind: address already in use" errors seen in CI. Also uses test TmpDir to ensure dir is cleaned up between tests.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-02 09:01:26 -07:00
Brad Davidson
0eeac6a622 Rework mock executor using gomock for call validation
Generate the mock executor with mockgen and convert existing uses of the mock executor to set it up properly.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-31 17:09:43 -07:00
Brad Davidson
d45006be66 Move etcd ready channel into executor
This eliminates the final channel that was being passed around in an internal struct. The ETCD management code passes in a func that can be polled until etcd is ready; the executor is responsible for polling this after etcd is started and closing the etcd ready channel at the correct time.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson
72bbd676f1 Fix etcd tests to use mock executor
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson
a8bc412422 Move container runtime ready channel into executor
Move the container runtime ready channel into the executor interface, instead of passing it awkwardly between server and agent config structs

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson
d694dd1db9 Add periodic background snapshot reconcile
Interval is configurable with new etcd-snapshot-reconcile-interval flag

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-05 12:18:19 -08:00
Brad Davidson
bed1f66880 Avoid use of github.com/pkg/errors functions that capture stack
We are not making use of the stack traces that these functions capture, so we should avoid using them as unnecessary overhead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-05 00:41:38 -08:00
Brad Davidson
f940368747 Use etcd proxy to bootstrap control-plane-only nodes, if possible
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-27 11:19:26 -08:00
Brad Davidson
5894af30ff Move CR APIs to k3s-io/api
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-24 11:17:27 -08:00
Brad Davidson
6199b79f4b Add etcd snapshot metrics
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-18 11:09:42 -08:00
Brad Davidson
4cacf6e1c0 Make etcd test linux-only
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-07 07:46:19 -08:00
Brad Davidson
0d028a2283 Add support for AWS shared credentials file
Also adds a CLI flag and fields for session token, which must be passed
alongside the access key and secret when using temporary credentials.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-29 00:45:56 -08:00
Brad Davidson
fd8348324d Disable s3 transport transparent compression/decompression
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-27 11:00:01 -08:00