Commit graph

251 commits

Author SHA1 Message Date
DT1mote
cc1c20fdc0 fix: typo in etcd membership error message
Some checks are pending
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Found a typo while working, quick fix.
It should display "This server is not a member of the etcd cluster" instead of "this server is a not a member of the etcd cluster"
Kind regards,

Signed-off-by: DT1mote <74531281+DT1mote@users.noreply.github.com>
2026-03-24 16:13:18 -07:00
Charlie Tonneslan
f40cf096c9 Fix typo: overriden -> overridden in snapshot_handler.go
Signed-off-by: Charlie Tonneslan <cst0520@gmail.com>
2026-03-24 16:12:56 -07:00
Brad Davidson
4cc440f2c9 Simplify snapshot compress/decompress logic
Some checks are pending
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Compression creates a zipfile with the same path as the snapshot file
containing only the snapshot. Decompression can be a bit simpler by also
extracting to the same path, and erroring if there are unexpected
contents.

In retrospect we probably should have just gzip'd the snapshot file, but
I think there was some intention to observe the same behavior as RKE1,
which used zip files.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-16 16:06:49 -07:00
Brad Davidson
f4bb1e60c3 Use etcd-snapshot-retention as default for s3 if etcd-s3-retention is not set
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-10 12:10:40 -07:00
Brad Davidson
3f5eec4c4e Drop use of github.com/gorilla/mux
mux is replaced with a simple wrapper around http.ServeMux with middleware chain support

Unfortunately github.com/rootless-containers/rootlesskit/pkg/parent
still uses it so we can't drop the indirect dep yet.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 16:09:01 -07:00
Brad Davidson
270484f01b Replace merr.NewErrors with errors.Join
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 16:09:01 -07:00
Brad Davidson
3acf8db8f2 Update packages to remove dep on archived github.com/pkg/errors
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 16:09:01 -07:00
Brad Davidson
8908d5fcde Do not create etcd name file if etcd is not in use
etcd.setName was being called during managed driver creation, even if the managed driver (etcd) is not in use. Let etcd.Register handle calling setName.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-06 14:50:06 -08:00
Brad Davidson
d300004f29 Improve resilience of datastore bootstrap reconcile from etcd
Some checks are pending
govulncheck / govulncheck (push) Waiting to run
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
* Add store tests with fixtures
* Try connecting to local etcd first, if it is available
* Handle panics from etcd backend code
* Don't try to read WAL and restore v3 snapshots as they almost never exist

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-26 10:13:04 -08:00
Brad Davidson
499e1b564b Fix removal of init node
Some checks are pending
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Removing the initial node from the cluster would previously cause etcd to panic on startup. Fixes to etcd reconcile have stopped that from happening, but now the node will successfully come up and start a new cluster - which is not right either. Require either manual removal of DB files to create a new cluster, or setting server address to join an existing cluster.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-10 15:49:28 -08:00
Brad Davidson
1f66d51a99 Explicitly close mvcc backend
Fixes issue that could cause excessive CPU usage on first server in embedded-etcd cluster

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-05 14:44:55 -08:00
Brad Davidson
b3962bd057 Fix restart of control-plane-only nodes attempting to reconcile from local datastore
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-30 18:39:02 -08:00
Brad Davidson
d38b4b30cd Replace temporary etcd server with raw mvcc store access
Fixes an issue where copying files out from under a currently-running etcd instance can cause startup reconcile to fail. Direct creation of a mvcc store without any of the raft stuff is faster, and gives us direct control over how the store handles snapshot recovery.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-05 09:59:29 -08:00
Brad Davidson
fc506e56dd lint: unnecessary-format,use-errors-new
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
91a41d8c30 lint: unnecessary-stmt
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
62d2737faa lint: unchecked-type-assertion
Adds a generic wrapper around lru.Cache

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
26b4f21479 lint: indent-error-flow
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
f279a979b3 lint: exported
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
7c7e442be0 lint: empty-lines
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
100cb633a3 lint: duplicated-imports
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
23093122b0 lint: defer,get-return
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
316464975e lint: redundant-build-tag
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
a6c6cd15c0 Fix panic in test cleanup when client is unset
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 07:22:48 -08:00
Brad Davidson
c3ca02aa75 Move embed into separate package from executor
Better isolates the K3s implementation from the interface, and aligns
the package path with other projects executors. This should also remove
the indirect flannel dep from other projects that don't use the embedded
executor.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
713cf8fbde Use patch helper for etcd labels and annotations
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
2b39b6808a Use patch helper for etcd member controller
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
785cfad963 Use patch helper for etcd snapshot annotation patch
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
b7ca944774 Move etcd metrics to separate package
Allows importing pkg/metrics without pulling in pkg/etcd, which was causing an import loop in a follow-up commit.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-08 12:53:10 -08:00
Brad Davidson
f0d54528d0 Stop waiting on CRI ready if context is cancelled
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-11-21 09:25:00 -08:00
Brad Davidson
7146e2000e Fix apiserver starting before remote etcd is up
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Has been cancelled
Fixes issue where the apiserver on control-plane-only nodes does not
actually wait for a connection to etcd to be available before starting.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-11-07 10:32:02 -08:00
Brad Davidson
171644cf0c Replace raw ListWatch with NewListWatchFromClient
NewListWatchFromClient replaces a bunch of boilerplate, and is also context-aware

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-10-27 15:06:45 -07:00
Brad Davidson
bfdcc7bcc8 Fix etcd member promotion
The `continue` was incorrectly changed to `return` when converting the
loop to an inline function in 4974fc7c24

Also addresses unnecessary creation of a new kubernetes client every
time the promotion check runs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-18 16:31:15 -07:00
Brad Davidson
4974fc7c24 Use sync.WaitGroup to avoid exiting before components have shut down
Currently only waits on etcd and kine, as other components
are stateless and do not need to shut down cleanly.

Terminal but non-fatal errors now request shutdown via context
cancellation, instead of just logging a fatal error.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-17 09:37:08 -07:00
Brad Davidson
b61d6f3b81 Transfer cluster leadership before removing leader member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-17 09:37:08 -07:00
Brad Davidson
659f2a7014 Fix perpetual etcd member removal
Fixes issue where member removal would be requeud until the node was deleted, or rejoined with a new name.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-17 09:37:08 -07:00
Brad Davidson
f3a036a9b1
Bump kine for compact_rev_key watch fix
Fix apiserver-managed compact, and enable it

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-05 00:04:41 +00:00
Derek Nola
9314d84714
Bump grpc and update resolver
Signed-off-by: Derek Nola <derek.nola@suse.com>
2025-09-04 17:33:42 -06:00
Derek Nola
56ef1cd3a2
Update etcd to v3.6.4-k3s3
* Raft is now an independent dependency, with a seperate release version
* errors moved into their own subpackage
* set a default WarningUnaryRequestDuration

Signed-off-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Michael Fritch <mfritch@suse.com>
2025-09-04 17:33:10 -06:00
Brad Davidson
f1c82392d0 Fix etcd join timeout handling
Error is deadline exceeded, not cancelled

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-08-27 13:41:54 -07:00
Brad Davidson
a9016f3dcb Add retry on etcd MemberAdd timeout
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-08-26 09:35:48 -07:00
Vitor Savian
a238f33cdd
Add retention flag specific for s3
* Add retention flag specific for s3
* Add retention for the unit tests:

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2025-07-28 13:42:09 -03:00
Brad Davidson
3a428ff02c Update metric help to be more descriptive.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-07-17 11:24:17 -07:00
Brad Davidson
5ce3db779d Update kine and use config defaults helper
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-07-11 10:10:13 -07:00
Brad Davidson
5cc51edafa Fix sqlite-etcd migration
Forgot to add new config to temporary kine in #12293

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-06-12 17:17:49 -07:00
Brad Davidson
db5390511e Switch from endpoints to endpointslices
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-06-09 11:28:02 -07:00
Brad Davidson
a8f0acbe52 Add CLI flag and config file for s3 bucket lookup type
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-05-07 11:50:22 -07:00
Brad Davidson
f90334e207 Fix etcd socket option config
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-11 13:39:44 -07:00
Brad Davidson
9deef77eef Add ReusePort/ReuseAddr flags to etcd config
Addresses flakes in etcd CI due to the port still being in TIME_WAIT after the server is shut down between tests

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-08 15:27:19 -07:00
Brad Davidson
a897f6875e Fix flakey etcd startup tests
Increase etcd shutdown delay to avoid "bind: address already in use" errors seen in CI. Also uses test TmpDir to ensure dir is cleaned up between tests.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-02 09:01:26 -07:00
Brad Davidson
0eeac6a622 Rework mock executor using gomock for call validation
Generate the mock executor with mockgen and convert existing uses of the mock executor to set it up properly.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-31 17:09:43 -07:00