Commit graph

63 commits

Author SHA1 Message Date
Brad Davidson
3acf8db8f2 Update packages to remove dep on archived github.com/pkg/errors
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-03-09 16:09:01 -07:00
Brad Davidson
d300004f29 Improve resilience of datastore bootstrap reconcile from etcd
Some checks are pending
govulncheck / govulncheck (push) Waiting to run
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
* Add store tests with fixtures
* Try connecting to local etcd first, if it is available
* Handle panics from etcd backend code
* Don't try to read WAL and restore v3 snapshots as they almost never exist

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-02-26 10:13:04 -08:00
Brad Davidson
b3962bd057 Fix restart of control-plane-only nodes attempting to reconcile from local datastore
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-30 18:39:02 -08:00
Brad Davidson
0563fc258f Fix etcd reconcile with empty TLS dirs
Reconcile against local etcd would short-circuit and skip reading from the datastore if the cert dirs were missing.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-05 09:59:29 -08:00
Brad Davidson
d38b4b30cd Replace temporary etcd server with raw mvcc store access
Fixes an issue where copying files out from under a currently-running etcd instance can cause startup reconcile to fail. Direct creation of a mvcc store without any of the raft stuff is faster, and gives us direct control over how the store handles snapshot recovery.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2026-01-05 09:59:29 -08:00
Brad Davidson
8e0e37e303 lint: useless-break
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
26b4f21479 lint: indent-error-flow
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-12-18 11:20:07 -08:00
Brad Davidson
7146e2000e Fix apiserver starting before remote etcd is up
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Has been cancelled
Fixes issue where the apiserver on control-plane-only nodes does not
actually wait for a connection to etcd to be available before starting.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-11-07 10:32:02 -08:00
Brad Davidson
4974fc7c24 Use sync.WaitGroup to avoid exiting before components have shut down
Currently only waits on etcd and kine, as other components
are stateless and do not need to shut down cleanly.

Terminal but non-fatal errors now request shutdown via context
cancellation, instead of just logging a fatal error.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-17 09:37:08 -07:00
Vitor Savian
dc03cb4b3f
Update k8s version to 1.33
* Update to 1.33

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Fix prints that broke unit tests

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Change binary max size to 75

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Change containerd version to fix misspelling

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Address binary size comment

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Update Dependencies

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Remove dependencie not used anymore

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

---------

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2025-04-30 04:43:37 -03:00
Derek Nola
aea3703f68 Implement secrets-encryption secretbox provider
- Add testlet for new provider switch
- Handle migration between providers
- Add exception for criticalcontrolargs
Signed-off-by: Derek Nola <derek.nola@suse.com>
2025-04-07 09:08:22 -07:00
Brad Davidson
bed1f66880 Avoid use of github.com/pkg/errors functions that capture stack
We are not making use of the stack traces that these functions capture, so we should avoid using them as unnecessary overhead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-05 00:41:38 -08:00
Brad Davidson
f940368747 Use etcd proxy to bootstrap control-plane-only nodes, if possible
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-27 11:19:26 -08:00
Brad Davidson
244bfd0c35 Use existing server-CA and hash if available
Also wraps errors along the cluster prepare path to improve tracability.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-27 11:19:26 -08:00
Brad Davidson
53fcadc028 Serve HTTP bootstrap data from datastore before disk
Fixes issue where CA rotation would fail on servers with join URL set due to using old data from disk on other server

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-27 11:19:26 -08:00
Brad Davidson
fe3324cb84 Fix rotateca validation failures when not touching default self-signed CAs
Also silences warnings about bootstrap fields that are not intended to be handled by CA rotation

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-08-22 14:47:40 -07:00
Brad Davidson
7ecd5874d2 Skip initial datastore reconcile during cluster-reset
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-11-15 14:31:44 -08:00
Brad Davidson
d885162967 Add server token hash to CR and S3
This required pulling the token hash stuff out of the cluster package, into util.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
7464007037 Store extra metadata and cluster ID for snapshots
Write the extra metadata both locally and to S3. These files are placed such that they will not be used by older versions of K3s that do not make use of them.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
002e6c43ee Reorganize Driver interface and etcd driver to avoid passing context and config into most calls
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson
d95980bba3 Lock bootstrap data with empty key to prevent conflicts
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-05 10:56:57 -07:00
Brad Davidson
992e64993d Add support for kubeadm token and client certificate auth
Allow bootstrapping with kubeadm bootstrap token strings or existing
Kubelet certs. This allows agents to join the cluster using kubeadm
bootstrap tokens, as created with the `k3s token create` command.

When the token expires or is deleted, agents can successfully restart by
authenticating with their kubelet certificate via node authentication.
If the token is gone and the node is deleted from the cluster, node auth
will fail and they will be prevented from rejoining the cluster until
provided with a valid token.

Servers still must be bootstrapped with the static cluster token, as
they will need to know it to decrypt the bootstrap data.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-07 14:55:04 -08:00
Derek Nola
13c633da12
Add Secrets Encryption to CriticalArgs (#6409)
* Add EncryptSecrets to Critical Control Args
* use deep comparison to extract differences

Signed-off-by: Derek Nola <derek.nola@suse.com>

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-11-04 10:35:29 -07:00
iyear
3aae7b8783 Fix incorrect defer usage
Problem:
Using defer inside a loop can lead to resource leaks

Solution:
Judge newer file in the separate function

Signed-off-by: iyear <ljyngup@gmail.com>
2022-11-01 16:23:25 -07:00
Derek Nola
06d81cb936
Replace deprecated ioutil package (#6230)
* Replace ioutil package
* check integration test null pointer
* Remove rotate retries

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-10-07 17:36:57 -07:00
Brad Davidson
fc1c100ffd Remove legacy bidirectional datastore sync code
Since #4438 removed 2-way sync and treats any changed+newer files on disk as an error, we no longer need to determine if files are newer on disk/db or if there is a conflicting mix of both. Any changed+newer file is an error, unless we're doing a cluster reset in which case everything is unconditionally replaced.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:10:30 -07:00
Brad Davidson
83420ef78e Fix fatal error when reconciling bootstrap data
Properly skip restoring bootstrap data for files that don't have a path
set because the feature that would set it isn't enabled.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:10:30 -07:00
Brad Davidson
96162c07c5 Handle egress-selector-mode change during upgrade
Properly handle unset egress-selector-mode from existing servers during cluster upgrade.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-30 11:57:41 -07:00
Brad Davidson
1339626a5b Defragment etcd datastore before clearing alarms
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-28 09:27:59 -07:00
Brad Davidson
3cebde924b Handle empty entries in bootstrap path map
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-17 13:42:27 -07:00
Luther Monson
9a849b1bb7
[master] changing package to k3s-io (#4846)
* changing package to k3s-io

Signed-off-by: Luther Monson <luther.monson@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2022-03-02 15:47:27 -08:00
Brad Davidson
9a48086524 Ignore cluster membership errors when reconciling from temp etcd
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson
e4846c92b4 Move temporary etcd startup into etcd module
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson
5014c9e0e8 Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 19:56:08 -08:00
Brad Davidson
a1b800f0bf Remove unnecessary copies of etcdconfig struct
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 12:05:16 -08:00
Brad Davidson
2989b8b2c5 Remove unnecessary copies of runtime struct
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 12:05:16 -08:00
Brad Davidson
5ca206ad3b Fix handling of agent-token fallback to token
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-01-07 09:56:37 -08:00
Brad Davidson
e7464a17f7 Fix use of agent creds for secrets-encrypt and config validate
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-01-06 12:55:18 -08:00
Brian Downs
3ae550ae51
Update bootstrap logic to output all changed files on disk (#4800) 2021-12-21 14:28:32 -07:00
Brad Davidson
8ad7d141e8 Close etcd clients to avoid leaking GRPC connections
If you don't explicitly close the etcd client when you're done with it,
the GRPC connection hangs around in the background. Normally this is
harmelss, but in the case of the temporary etcd we start up on 2399 to
reconcile bootstrap data, the client will start logging errors
afterwards when the server goes away.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-12-17 23:55:17 -08:00
Derek Nola
17eebe0563
Fix cold boot and reconcilation on secondary servers (#4747)
* Enable reconcilation on secondary servers

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Remove unused code

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Attempt to reconcile with datastore first

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Added warning on failure

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Update warning

Signed-off-by: Derek Nola <derek.nola@suse.com>

* golangci-lint fix

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-12-15 15:38:50 -08:00
Hussein Galal
d71b335871
Fix snapshot restoration on fresh nodes (#4737)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-12-14 02:04:39 +02:00
Brian Downs
bf4e037fcf
Resolve Bootstrap Migration Edge Case (#4730) 2021-12-13 13:02:30 -07:00
Brian Downs
a6fe2c0bc5
Resolve restore bootstrap (#4704) 2021-12-09 14:54:27 -07:00
Manuel Buil
1e0696628e
Merge pull request #4581 from manuelbuil/checking-HA-parameters
Verify new control plane nodes joining the cluster share the same config as cluster members
2021-12-08 10:49:28 +01:00
Derek Nola
bcb662926d
Secrets-encryption rotation (#4372)
* Regular CLI framework for encrypt commands
* New secrets-encryption feature
* New integration test
* fixes for flaky integration test CI
* Fix to bootstrap on restart of existing nodes
* Consolidate event recorder

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-12-07 14:31:32 -08:00
Manuel Buil
1b3187ea07 Check HA network parameters
Signed-off-by: Manuel Buil <mbuil@suse.com>
2021-12-07 23:09:05 +01:00
Chris Kim
f18b3252c0
[master] Add etcd extra args support for K3s (#4463)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* go generate

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-11 21:03:15 -08:00
Brian Downs
adaeae351c
update bootstrap logic (#4438)
* update bootstrap logic resolving a startup bug and account for etcd
2021-11-10 05:33:42 -07:00
Brian Downs
0a0b915921
reset buffer after use (#4279) 2021-10-22 15:56:01 -07:00