Commit graph

169 commits

Author SHA1 Message Date
Brad Davidson
3f5eec4c4e Drop use of github.com/gorilla/mux
mux is replaced with a simple wrapper around http.ServeMux with middleware chain support

Unfortunately github.com/rootless-containers/rootlesskit/pkg/parent
still uses it so we can't drop the indirect dep yet.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 16:09:01 -07:00
Brad Davidson
3acf8db8f2 Update packages to remove dep on archived github.com/pkg/errors
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 16:09:01 -07:00
Brad Davidson
d300004f29 Improve resilience of datastore bootstrap reconcile from etcd
Some checks are pending
govulncheck / govulncheck (push) Waiting to run
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
* Add store tests with fixtures
* Try connecting to local etcd first, if it is available
* Handle panics from etcd backend code
* Don't try to read WAL and restore v3 snapshots as they almost never exist

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-26 10:13:04 -08:00
Brad Davidson
b3962bd057 Fix restart of control-plane-only nodes attempting to reconcile from local datastore
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-30 18:39:02 -08:00
Brad Davidson
0563fc258f Fix etcd reconcile with empty TLS dirs
Reconcile against local etcd would short-circuit and skip reading from the datastore if the cert dirs were missing.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-05 09:59:29 -08:00
Brad Davidson
d38b4b30cd Replace temporary etcd server with raw mvcc store access
Fixes an issue where copying files out from under a currently-running etcd instance can cause startup reconcile to fail. Direct creation of a mvcc store without any of the raft stuff is faster, and gives us direct control over how the store handles snapshot recovery.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-05 09:59:29 -08:00
Brad Davidson
fc506e56dd lint: unnecessary-format,use-errors-new
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
8e0e37e303 lint: useless-break
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
26b4f21479 lint: indent-error-flow
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
d8af4f162a lint: if-return
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
7146e2000e Fix apiserver starting before remote etcd is up
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Has been cancelled
Fixes issue where the apiserver on control-plane-only nodes does not
actually wait for a connection to etcd to be available before starting.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-11-07 10:32:02 -08:00
Brad Davidson
d8790220ff Move node password secrets into dedicated controller
Some checks are pending
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Move the node password secret cleanup into its own dedicated controller
that also handles auth. We now use a filtered cache of only
node-password secrets, instead of using the wrangler secret cache,
which stores all secrets from all namespaces.

The coredns node-hosts controller also now uses a single-resource
watch cache on the coredns configmap, instead of reading it from
the apiserver every time a node changes.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-10-27 15:06:45 -07:00
Brad Davidson
0d9ef273d8 Remove node addresses from filter when node is deleted
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-10-27 15:06:45 -07:00
Brad Davidson
d6e84ba2d1 Fix kine metrics registration without --kine-tls
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-24 15:18:34 -07:00
Brad Davidson
4974fc7c24 Use sync.WaitGroup to avoid exiting before components have shut down
Currently only waits on etcd and kine, as other components
are stateless and do not need to shut down cleanly.

Terminal but non-fatal errors now request shutdown via context
cancellation, instead of just logging a fatal error.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-17 09:37:08 -07:00
Brad Davidson
a7d3c8559f Fix IPv6 handling for loadbalancer addresses
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-16 11:04:22 -07:00
Derek Nola
56ef1cd3a2
Update etcd to v3.6.4-k3s3
* Raft is now an independent dependency, with a seperate release version
* errors moved into their own subpackage
* set a default WarningUnaryRequestDuration

Signed-off-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Michael Fritch <mfritch@suse.com>
2025-09-04 17:33:10 -06:00
Brad Davidson
c837bfcdc7 Bump kine for metrics panic fix
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-03 09:52:51 -07:00
bo.jiang
b5f4fd1d73 Fix K3s not validating datastore connection when no token is set
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2025-06-05 12:49:26 -07:00
Vitor Savian
dc03cb4b3f
Update k8s version to 1.33
* Update to 1.33

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Fix prints that broke unit tests

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Change binary max size to 75

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Change containerd version to fix misspelling

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Address binary size comment

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Update Dependencies

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Remove dependencie not used anymore

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

---------

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2025-04-30 04:43:37 -03:00
Brad Davidson
3f7e6a30ce Move delegating auth middleware into common package and add MaxInFlight
Adds maximum in-flight request limits to agent join and p2p peer info
request request handlers.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-17 14:20:39 -07:00
Derek Nola
aea3703f68 Implement secrets-encryption secretbox provider
- Add testlet for new provider switch
- Handle migration between providers
- Add exception for criticalcontrolargs
Signed-off-by: Derek Nola <derek.nola@suse.com>
2025-04-07 09:08:22 -07:00
Brad Davidson
1ba19856de Add tests for control-plane component arg generation
Use mocked executor to ensure the correct args are being passed to components

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-31 17:09:43 -07:00
Brad Davidson
d45006be66 Move etcd ready channel into executor
This eliminates the final channel that was being passed around in an internal struct. The ETCD management code passes in a func that can be polled until etcd is ready; the executor is responsible for polling this after etcd is started and closing the etcd ready channel at the correct time.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson
529e748ac7 Move apiserver ready wait into common channel
Splits server startup into prepare/start phases. Server's agent is now
started after server is prepared, but before it is started. This allows
us to properly bootstrap the executor before starting server components,
and use the executor to provide a shared channel to wait on apiserver
readiness.

This allows us to replace four separate callers of WaitForAPIServerReady
with reads from a common ready channel.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson
d694dd1db9 Add periodic background snapshot reconcile
Interval is configurable with new etcd-snapshot-reconcile-interval flag

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-05 12:18:19 -08:00
Brad Davidson
bed1f66880 Avoid use of github.com/pkg/errors functions that capture stack
We are not making use of the stack traces that these functions capture, so we should avoid using them as unnecessary overhead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-05 00:41:38 -08:00
Brad Davidson
f940368747 Use etcd proxy to bootstrap control-plane-only nodes, if possible
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-27 11:19:26 -08:00
Brad Davidson
244bfd0c35 Use existing server-CA and hash if available
Also wraps errors along the cluster prepare path to improve tracability.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-27 11:19:26 -08:00
Brad Davidson
53fcadc028 Serve HTTP bootstrap data from datastore before disk
Fixes issue where CA rotation would fail on servers with join URL set due to using old data from disk on other server

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-27 11:19:26 -08:00
Brad Davidson
e6327652f0 Replace *core.Factory with CoreFactory interface
Make this field an interface instead of pointer to allow mocking. Not sure why wrangler has a type that returns an interface instead of just making it an interface itself. Wrangler in general is hard to mock for testing.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-09 00:51:19 -08:00
Brad Davidson
95797c4a79 Refactor filterCN to use a Set instead of map[string]bool
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-06 11:45:34 -08:00
manuelbuil
4ec261733e If no etcd was deployed, fail etcd-snapshot with a useful error
Signed-off-by: manuelbuil <mbuil@suse.com>
2024-11-28 09:40:42 +01:00
Brad Davidson
fe3324cb84 Fix rotateca validation failures when not touching default self-signed CAs
Also silences warnings about bootstrap fields that are not intended to be handled by CA rotation

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-08-22 14:47:40 -07:00
Will Andrews
e2179aa957 Update pkg/cluster/managed.go
Co-authored-by: Derek Nola <derek.nola@suse.com>
Signed-off-by: Will Andrews <will7989@hotmail.com>
2024-07-29 16:23:17 -07:00
Will
e4f3cc7b54 remove deprecated use of wait functions
Signed-off-by: Will <will7989@hotmail.com>
2024-07-29 16:23:17 -07:00
Brad Davidson
f8e0648304 Convert remaining http handlers over to use util.SendError
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Brad Davidson
ff679fb3ab Refactor supervisor listener startup and add metrics
* Refactor agent supervisor listener startup and authn/authz to use upstream
  auth delegators to perform for SubjectAccessReview for access to
  metrics.
* Convert spegel and pprof handlers over to new structure.
* Promote bind-address to agent flag to allow setting supervisor bind
  address for both agent and server.
* Promote enable-pprof to agent flag to allow profiling agents. Access
  to the pprof endpoint now requires client cert auth, similar to the
  spegel registry api endpoint.
* Add prometheus metrics handler.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Brad Davidson
3d14092f76 Fix issue with k3s-etcd informers not starting
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 15:48:15 -07:00
Hussein Galal
144f5ad333
Kubernetes V1.30.0-k3s1 (#10063)
* kubernetes 1.30.0-k3s1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Update go version to v1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener and helm-controller

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in go.mod to 1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in Dockerfiles

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update cri-dockerd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add proctitle package with linux and windows constraints

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing setproctitle function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener to v0.6.0-rc1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

---------

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2024-05-06 19:42:27 +03:00
Brad Davidson
7d9abc9f07 Improve etcd load-balancer startup behavior
Prefer the address of the etcd member being joined, and seed the full address list immediately on startup.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:36:33 -07:00
Brad Davidson
fe465cc832 Move etcd snapshot management CLI to request/response
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:21:26 -07:00
Vitor Savian
5d69d6e782 Add tls for kine
Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Bump kine

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Add integration tests for kine with tls

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-03-28 11:12:07 -03:00
Brad Davidson
c51d7bfbd1 Add health-check support to loadbalancer
* Adds support for health-checking loadbalancer servers. If a
  health-check fails when dialing, all existing connections to the
  server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
  for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
  check for etcd connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:50:27 -07:00
Brad Davidson
82e3c32c9f Retry startup snapshot reconcile
The reconcile may run before the kubelet has created the node object; retry until it succeeds

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-06 17:46:24 -08:00
Brad Davidson
7ecd5874d2 Skip initial datastore reconcile during cluster-reset
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-11-15 14:31:44 -08:00
Brad Davidson
d885162967 Add server token hash to CR and S3
This required pulling the token hash stuff out of the cluster package, into util.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
7464007037 Store extra metadata and cluster ID for snapshots
Write the extra metadata both locally and to S3. These files are placed such that they will not be used by older versions of K3s that do not make use of them.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Derek Nola
dface01de8
Server Token Rotation (#8265)
* Consolidate NewCertCommands
* Add support for user defined new token
* Add E2E testlets

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Ensure agent token also changes

Signed-off-by: Derek Nola <derek.nola@suse.com>
2023-10-09 10:58:49 -07:00
Manuel Buil
f2c7117374 Take IPFamily precedence based on order
Signed-off-by: Manuel Buil <mbuil@suse.com>
2023-09-29 11:04:15 +02:00