Fixes issue where RKE2 event recorder events were not logged to console due to lack of logging context.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Found a typo while working, quick fix.
It should display "This server is not a member of the etcd cluster" instead of "this server is a not a member of the etcd cluster"
Kind regards,
Signed-off-by: DT1mote <74531281+DT1mote@users.noreply.github.com>
Compression creates a zipfile with the same path as the snapshot file
containing only the snapshot. Decompression can be a bit simpler by also
extracting to the same path, and erroring if there are unexpected
contents.
In retrospect we probably should have just gzip'd the snapshot file, but
I think there was some intention to observe the same behavior as RKE1,
which used zip files.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
mux is replaced with a simple wrapper around http.ServeMux with middleware chain support
Unfortunately github.com/rootless-containers/rootlesskit/pkg/parent
still uses it so we can't drop the indirect dep yet.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
etcd.setName was being called during managed driver creation, even if the managed driver (etcd) is not in use. Let etcd.Register handle calling setName.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Add support for the "nix" snapshotter, which enables running container
images built with nix2container. Nix images reference store paths
directly, avoiding layer tarballs and enabling deduplication through
the nix store.
Changes:
- Register nix-snapshotter as a builtin containerd plugin
- Add NixSupported() validation (checks nix-store is in PATH)
- Configure nix-snapshotter image service proxy in V2/V3 templates
with containerd_address for CRI image operations
- Add Transfer service unpack_config with differ=walking for
multi-arch support
- Use containerd state dir for socket path (rootless compatible)
- Disable NRI in rootless mode to prevent bind failures
Usage: k3s server --snapshotter nix
Signed-off-by: Ada <ada@6bit.com>
Co-Authored-By: Joshua Perry <josh@6bit.com>
Signed-off-by: Ada <ada@6bit.com>
Add imports to the generated containerd config so containerd loads
drop-in TOML files: config.toml.d for v2, config-v3.toml.d for v3
(e.g. /var/lib/rancher/k3s/agent/etc/containerd/config.toml.d and
/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d).
Also fix the v3 header comment to say config-v3.toml.tmpl instead
of config.toml.tmpl.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
* Add store tests with fixtures
* Try connecting to local etcd first, if it is available
* Handle panics from etcd backend code
* Don't try to read WAL and restore v3 snapshots as they almost never exist
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Removing the initial node from the cluster would previously cause etcd to panic on startup. Fixes to etcd reconcile have stopped that from happening, but now the node will successfully come up and start a new cluster - which is not right either. Require either manual removal of DB files to create a new cluster, or setting server address to join an existing cluster.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Not currently used, but was useful in tracking down the specific call path for the empty token handling
Prints error as:
> `msg="Error: starting kubernetes: failed to start cluster: failed to normalize server token; must be in format K10<CA-HASH>::<USERNAME>:<PASSWORD> or <PASSWORD> at github.com/urfave/cli/v2.(*App).RunContext(app.go:333)->github.com/urfave/cli/v2.(*Command).Run(command.go:269)->github.com/urfave/cli/v2.(*Command).Run(command.go:276)->github.com/k3s-io/k3s/pkg/cli/server.Run(server.go:48)->github.com/k3s-io/k3s/pkg/cli/server.run(server.go:629)->github.com/k3s-io/k3s/pkg/server.StartServer(server.go:74)->github.com/k3s-io/k3s/pkg/daemons/control.Server(server.go:72)->github.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start(cluster.go:75)->github.com/k3s-io/k3s/pkg/cluster.Save(storage.go:79)->github.com/k3s-io/k3s/pkg/util.NormalizeToken(token.go:51)"`
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Fixes issue preventing containerd from accessing spegel on ipv6-primary agents. Only affects agents because only agents use the kubelet-serving cert for the supervisor listener.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Spegel insists on checking containerd features when the store is created, so defer creating it until after contaienerd is up
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
- Use os.CreateTemp to avoid race conditions with fixed temp filename
- Add f.Sync() before close to ensure data durability
- Check all fmt.Fprintf errors instead of ignoring them
- Preserve original file permissions when overwriting
- Handle dir== edge case from filepath.Split
- Check os.MkdirAll error
- Proper cleanup on all error paths
Signed-off-by: luojiyin <luojiyin@hotmail.com>
Add documentation comments to WriteSubnetFile
Clarify the design choices for atomic file writing:
- Explain why CreateTemp is used (defense-in-depth, avoids pre-existing file issues)
- Document the single-instance assumption
- Note the permission preservation logic
Signed-off-by: luojiyin <luojiyin@hotmail.com>
Update WriteSubnetFile comment to clarify CreateTemp rationale
Remove misleading reference to concurrent writes (K3s is single-instance).
Focus on the actual benefits: avoiding stale temp files from crashes,
handling unexpected permissions/ownership, and O_EXCL guarantees.
Signed-off-by: luojiyin <luojiyin@hotmail.com>
Refactor cleanup to use merr.NewErrors for better error aggregation
Address review feedback from @brandond to improve error handling:
- Change cleanup function to accept error parameter
- Use merr.NewErrors to aggregate original error with Close/Remove errors
- Simplify error handling with consistent return cleanup(err) pattern
Signed-off-by: luojiyin <luojiyin@hotmail.com>
Fix Close error handling to preserve original error
Add cleanupNoClose helper to avoid double Close and preserve the
original Close error when file close fails.
Signed-off-by: luojiyin <luojiyin@hotmail.com>
Reconcile against local etcd would short-circuit and skip reading from the datastore if the cert dirs were missing.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Fixes an issue where copying files out from under a currently-running etcd instance can cause startup reconcile to fail. Direct creation of a mvcc store without any of the raft stuff is faster, and gives us direct control over how the store handles snapshot recovery.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>