Commit graph

1694 commits

Author SHA1 Message Date
Brad Davidson
a666b7905c Add context to controller event recorders
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
govulncheck / govulncheck (push) Has been cancelled
Fixes issue where RKE2 event recorder events were not logged to console due to lack of logging context.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-25 15:32:15 -07:00
DT1mote
cc1c20fdc0 fix: typo in etcd membership error message
Some checks are pending
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Found a typo while working, quick fix.
It should display "This server is not a member of the etcd cluster" instead of "this server is a not a member of the etcd cluster"
Kind regards,

Signed-off-by: DT1mote <74531281+DT1mote@users.noreply.github.com>
2026-03-24 16:13:18 -07:00
Charlie Tonneslan
f40cf096c9 Fix typo: overriden -> overridden in snapshot_handler.go
Signed-off-by: Charlie Tonneslan <cst0520@gmail.com>
2026-03-24 16:12:56 -07:00
Brad Davidson
4cc440f2c9 Simplify snapshot compress/decompress logic
Some checks are pending
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Compression creates a zipfile with the same path as the snapshot file
containing only the snapshot. Decompression can be a bit simpler by also
extracting to the same path, and erroring if there are unexpected
contents.

In retrospect we probably should have just gzip'd the snapshot file, but
I think there was some intention to observe the same behavior as RKE1,
which used zip files.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-16 16:06:49 -07:00
Brad Davidson
268322414f Bump containerd to v2.2.2
Some checks failed
govulncheck / govulncheck (push) Has been cancelled
Scorecard supply-chain security / Scorecard analysis (push) Has been cancelled
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-11 18:07:30 -07:00
Brad Davidson
f4bb1e60c3 Use etcd-snapshot-retention as default for s3 if etcd-s3-retention is not set
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-10 12:10:40 -07:00
Derek Nola
e4cb0e74e0
Save cluster state before reencyrpting secrets with newly created key (#13764)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2026-03-10 09:20:50 -07:00
Brad Davidson
3f5eec4c4e Drop use of github.com/gorilla/mux
mux is replaced with a simple wrapper around http.ServeMux with middleware chain support

Unfortunately github.com/rootless-containers/rootlesskit/pkg/parent
still uses it so we can't drop the indirect dep yet.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 16:09:01 -07:00
Brad Davidson
270484f01b Replace merr.NewErrors with errors.Join
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 16:09:01 -07:00
Brad Davidson
3acf8db8f2 Update packages to remove dep on archived github.com/pkg/errors
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 16:09:01 -07:00
Brad Davidson
6ffcd77ffd Bump klipper-lb and klipper-helm
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 15:11:09 -07:00
Brad Davidson
8908d5fcde Do not create etcd name file if etcd is not in use
etcd.setName was being called during managed driver creation, even if the managed driver (etcd) is not in use. Let etcd.Register handle calling setName.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-06 14:50:06 -08:00
Ada
de59b6327c Add nix-snapshotter support to the embedded containerd
Add support for the "nix" snapshotter, which enables running container
images built with nix2container. Nix images reference store paths
directly, avoiding layer tarballs and enabling deduplication through
the nix store.

Changes:
- Register nix-snapshotter as a builtin containerd plugin
- Add NixSupported() validation (checks nix-store is in PATH)
- Configure nix-snapshotter image service proxy in V2/V3 templates
  with containerd_address for CRI image operations
- Add Transfer service unpack_config with differ=walking for
  multi-arch support
- Use containerd state dir for socket path (rootless compatible)
- Disable NRI in rootless mode to prevent bind failures

Usage: k3s server --snapshotter nix

Signed-off-by: Ada <ada@6bit.com>
Co-Authored-By: Joshua Perry <josh@6bit.com>
Signed-off-by: Ada <ada@6bit.com>
2026-03-06 12:36:57 -08:00
Fabiano Fidêncio
b51167a996 config: add default imports to containerd base templates
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Has been cancelled
Add imports to the generated containerd config so containerd loads
drop-in TOML files: config.toml.d for v2, config-v3.toml.d for v3
(e.g. /var/lib/rancher/k3s/agent/etc/containerd/config.toml.d and
/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d).

Also fix the v3 header comment to say config-v3.toml.tmpl instead
of config.toml.tmpl.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-02 14:20:29 -08:00
Brad Davidson
d300004f29 Improve resilience of datastore bootstrap reconcile from etcd
Some checks are pending
govulncheck / govulncheck (push) Waiting to run
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
* Add store tests with fixtures
* Try connecting to local etcd first, if it is available
* Handle panics from etcd backend code
* Don't try to read WAL and restore v3 snapshots as they almost never exist

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-26 10:13:04 -08:00
Derek Nola
2f527ff16b Revert "Move to rootlesskit v2 (#13486)"
This reverts commit f1b166f74f.

Signed-off-by: Derek Nola <derek.nola@suse.com>
2026-02-26 08:38:14 -08:00
Brad Davidson
499e1b564b Fix removal of init node
Some checks are pending
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Removing the initial node from the cluster would previously cause etcd to panic on startup. Fixes to etcd reconcile have stopped that from happening, but now the node will successfully come up and start a new cluster - which is not right either. Require either manual removal of DB files to create a new cluster, or setting server address to join an existing cluster.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-10 15:49:28 -08:00
Brad Davidson
abad5b9fb0 Bump klipper-helm and klipper-lb images
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-10 14:56:47 -08:00
Brad Davidson
1f66d51a99 Explicitly close mvcc backend
Fixes issue that could cause excessive CPU usage on first server in embedded-etcd cluster

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-05 14:44:55 -08:00
zijiren
084c7aafc4
Fix VPN node IP not being applied to kubelet (#13457)
Signed-off-by: zijiren233 <pyh1670605849@gmail.com>
2026-02-04 10:16:09 -08:00
Brad Davidson
ce17fce058 Add helper function for including stack trace with error message
Not currently used, but was useful in tracking down the specific call path for the empty token handling

Prints error as:
> `msg="Error: starting kubernetes: failed to start cluster: failed to normalize server token; must be in format K10<CA-HASH>::<USERNAME>:<PASSWORD> or <PASSWORD> at github.com/urfave/cli/v2.(*App).RunContext(app.go:333)->github.com/urfave/cli/v2.(*Command).Run(command.go:269)->github.com/urfave/cli/v2.(*Command).Run(command.go:276)->github.com/k3s-io/k3s/pkg/cli/server.Run(server.go:48)->github.com/k3s-io/k3s/pkg/cli/server.run(server.go:629)->github.com/k3s-io/k3s/pkg/server.StartServer(server.go:74)->github.com/k3s-io/k3s/pkg/daemons/control.Server(server.go:72)->github.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start(cluster.go:75)->github.com/k3s-io/k3s/pkg/cluster.Save(storage.go:79)->github.com/k3s-io/k3s/pkg/util.NormalizeToken(token.go:51)"`

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-03 14:47:50 -08:00
Brad Davidson
5e63bbe260 Handle empty token file as nonexistent
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-03 14:47:50 -08:00
Brad Davidson
d9c422a3ab Add IPv6 loopback to kubelet-serving cert
Fixes issue preventing containerd from accessing spegel on ipv6-primary agents. Only affects agents because only agents use the kubelet-serving cert for the supervisor listener.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-03 14:47:34 -08:00
Brad Davidson
e69d18614f Fix filter for wildcards
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Has been cancelled
Wildcard entry should be bare `*` or `_default`, not a URL

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-30 20:14:55 -08:00
Brad Davidson
b3962bd057 Fix restart of control-plane-only nodes attempting to reconcile from local datastore
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-30 18:39:02 -08:00
Derek Nola
f1b166f74f
Move to rootlesskit v2 (#13486)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2026-01-21 10:14:10 -08:00
Manuel Buil
c24294f24f Fix lines to satisfy lint
Signed-off-by: Manuel Buil <mbuil@suse.com>
2026-01-19 15:35:50 +01:00
Brad Davidson
2ed73bed39 Add deferred store implimentation
Some checks are pending
govulncheck / govulncheck (push) Waiting to run
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Spegel insists on checking containerd features when the store is created, so defer creating it until after contaienerd is up

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-08 13:54:14 -08:00
Brad Davidson
efeacc1ed8 Bump spegel to v0.6.0
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-08 13:54:14 -08:00
luojiyin
f42523c55f Fix atomic write in WriteSubnetFile
- Use os.CreateTemp to avoid race conditions with fixed temp filename
   - Add f.Sync() before close to ensure data durability
   - Check all fmt.Fprintf errors instead of ignoring them
   - Preserve original file permissions when overwriting
   - Handle dir== edge case from filepath.Split
   - Check os.MkdirAll error
   - Proper cleanup on all error paths

Signed-off-by: luojiyin <luojiyin@hotmail.com>

Add documentation comments to WriteSubnetFile

   Clarify the design choices for atomic file writing:
   - Explain why CreateTemp is used (defense-in-depth, avoids pre-existing file issues)
   - Document the single-instance assumption
   - Note the permission preservation logic

Signed-off-by: luojiyin <luojiyin@hotmail.com>

Update WriteSubnetFile comment to clarify CreateTemp rationale

   Remove misleading reference to concurrent writes (K3s is single-instance).
   Focus on the actual benefits: avoiding stale temp files from crashes,
   handling unexpected permissions/ownership, and O_EXCL guarantees.

Signed-off-by: luojiyin <luojiyin@hotmail.com>

Refactor cleanup to use merr.NewErrors for better error aggregation

   Address review feedback from @brandond to improve error handling:
   - Change cleanup function to accept error parameter
   - Use merr.NewErrors to aggregate original error with Close/Remove errors
   - Simplify error handling with consistent return cleanup(err) pattern

Signed-off-by: luojiyin <luojiyin@hotmail.com>

Fix Close error handling to preserve original error

   Add cleanupNoClose helper to avoid double Close and preserve the
   original Close error when file close fails.

Signed-off-by: luojiyin <luojiyin@hotmail.com>
2026-01-08 11:37:41 -08:00
Brad Davidson
1f2f610b5a Remove flannel external-ip annotations when disabled
Some checks are pending
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-07 11:58:56 -08:00
Brad Davidson
0563fc258f Fix etcd reconcile with empty TLS dirs
Reconcile against local etcd would short-circuit and skip reading from the datastore if the cert dirs were missing.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-05 09:59:29 -08:00
Brad Davidson
d38b4b30cd Replace temporary etcd server with raw mvcc store access
Fixes an issue where copying files out from under a currently-running etcd instance can cause startup reconcile to fail. Direct creation of a mvcc store without any of the raft stuff is faster, and gives us direct control over how the store handles snapshot recovery.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-05 09:59:29 -08:00
Derek Nola
fd48cd6233 Allow k3s secrets-encrypt enable on existing clusters
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Has been cancelled
- Places an identity provider as a setup to enable later encryption
- Update secrets-encryption test
Signed-off-by: Derek Nola <derek.nola@suse.com>
2025-12-30 10:34:23 -08:00
Brad Davidson
e44a77d475 lint: nested-structs
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
govulncheck / govulncheck (push) Has been cancelled
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
fc506e56dd lint: unnecessary-format,use-errors-new
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
003fd4471c lint: unhandled-error
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
c1f02b8b19 lint: identical-switch-branches
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
8e0e37e303 lint: useless-break
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
91a41d8c30 lint: unnecessary-stmt
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
49d080c7b7 lint: unexported-return
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
46c7ade9e9 lint: unexported-naming
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
62d2737faa lint: unchecked-type-assertion
Adds a generic wrapper around lru.Cache

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
83feb3c31d lint: superfluous-else
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
e416f10e3a lint: struct-tag
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
291086171b lint: redefines-builtin-id
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
26b4f21479 lint: indent-error-flow
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
4d1ad3d595 lint: import-alias-naming
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
d8af4f162a lint: if-return
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
f279a979b3 lint: exported
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00