Commit graph

226 commits

Author SHA1 Message Date
Ondra Kupka
51ef94c547 controller/nodelifecycle: Improve goroutine mgmt
Make sure all threads are terminated when Run returns.
2025-10-29 19:04:38 +01:00
xigang
574b09b7de nodelifecycle: fix ComputeZoneState method comment
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-09-28 10:56:56 +08:00
Kubernetes Prow Robot
1eb2c4182d
Merge pull request #134102 from mayank-agrwl/namespace-nodelifecycle-contextual
Replace HandleError with HandleErrorWithContext
2025-09-23 18:50:17 -07:00
Aditi Gupta
f58d1e101f refactor(controller): Use WithContext variants in cloud node controllers
This change refactors the cloud-specific versions of the node lifecycle
and node IPAM controllers to use a context.Context for cancellation and
contextual logging, replacing the legacy stopCh pattern.

This is a follow-up to PR #133985, where these controllers were
separated out due to their use in the legacy Cloud Controller Manager
(CCM).

It is a known issue that the CCM's startup logic does not pass the
controller name via the context. This change proceeds with the
refactoring to unify the cancellation logic across controllers, while
acknowledging that contextual logs will be less detailed when these
controllers are run in the CCM.

Signed-off-by: Aditi Gupta <aditigpta@google.com>
2025-09-17 00:17:38 -07:00
Mayank Agrawal
d12eeb98d0 Replace HandleError with HandleErrorWithContext 2025-09-16 23:47:23 -07:00
PatrickLaabs
baf71997f5 chore: depr. pointer pkg replacement for pkg/controller 2025-07-07 13:22:36 +02:00
Quan Tian
f718096b74 NoExecute taint should be added when a Node's ready condition becomes Unknown
After a Node has stopped posting heartbeats for nodeMonitorGracePeriod,
it will be considered unreachable, its ready condition will be set to
Unknown, NoSchedule taint will be added, all Pods on it will be set to
NotReady, but there is always a delay of 5s before NoExecute taint is
added to the Node, adding 5s to the recovery time of Pods which are
supposed to be evicted by the taint and recreated on other Nodes sooner.

The delay is because processTaintBaseEviction() uses the last observed
ready condition of the Node instead of the current one to determine
whether it should add the Node to the taint queue. When a Node is set to
unreachable due to missing heartbeats, the last observed ready condition
is still true and the current ready condition is unknown, we should use
the latter for processTaintBaseEviction().

Signed-off-by: Quan Tian <qtian@vmware.com>
2025-05-10 17:22:11 +08:00
xigang
5c4948ff31 controller: factor out pod node name indexer helper function
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-03-17 20:21:30 +08:00
Keisuke Ishigami
cdac61b902 use generic set in sig-node 2025-03-11 20:00:15 +09:00
Patrick Ohly
8a908e0c0b remove import doc comments
The "// import <path>" comment has been superseded by Go modules.
We don't have to remove them, but doing so has some advantages:

- They are used inconsistently, which is confusing.
- We can then also remove the (currently broken) hack/update-vanity-imports.sh.
- Last but not least, it would be a first step towards avoiding the k8s.io domain.

This commit was generated with
   sed -i -e 's;^package \(.*\) // import.*;package \1;' $(git grep -l '^package.*// import' | grep -v 'vendor/')

Everything was included, except for
   package labels // import k8s.io/kubernetes/pkg/util/labels
because that package is marked as "read-only".
2024-12-02 16:59:34 +01:00
Kubernetes Prow Robot
1ac23e24a0
Merge pull request #127956 from carlory/KEP-3902-test
node-lifecycle-controller: improve processPod test-coverage
2024-11-07 08:51:30 +00:00
杨朱 · Kiki
c4814f180a Use k8s.io/kubernetes/test/utils/ktesting 2024-11-07 10:36:13 +08:00
Kubernetes Prow Robot
71523a7db6
Merge pull request #122644 from gyuho/logs-removing-taints
chores(controller/nodelifecycle): make node taint removal logs more a…
2024-10-23 01:17:15 +01:00
carlory
4558dc1432 node-lifecycle-controller: improve processPod test-coverage 2024-10-10 13:52:10 +08:00
Kubernetes Prow Robot
f2700895a4
Merge pull request #127422 from srivastav-abhishek/go-vet-fix
Go vet fixes for gotip
2024-09-20 14:37:58 +01:00
Abhishek Kr Srivastav
95860cff1c Fix Go vet errors for master golang
Co-authored-by: Rajalakshmi-Girish <rajalakshmi.girish1@ibm.com>
Co-authored-by: Abhishek Kr Srivastav <Abhishek.kr.srivastav@ibm.com>
2024-09-20 12:36:38 +05:30
guozheng-shen
686ccceba3
Update node_lifecycle_controller.go
remove 'pod-eviction-timeout' comment
2024-09-09 14:36:41 +08:00
Joe Betz
2595aa1309 generate 2024-09-03 14:26:26 -04:00
devppratik
f8bf6b97b8 Update Node Monitor Grace Period default duration to 50s
Update description

Improve flag comment

Update Test case value to be 50s by default

Update Description

Run make update

Minor description fix
2024-07-24 22:54:44 +05:30
Alvaro Aleman
6d0ac8c561 Use the generic/typed workqueue throughout
This change makes us use the generic workqueue throughout the project in
order to improve type safety and readability of the code.
2024-05-04 14:33:12 -04:00
liyuerich
98dfaed4be drop deprecated workqueue NewNamed package
Signed-off-by: liyuerich <yue.li@daocloud.io>
2024-04-28 14:04:51 +08:00
Kubernetes Prow Robot
67a06c2056
Merge pull request #122293 from mengjiao-liu/controller-reconsider-log-verbosity
kube-controller-manager: readjust log verbosity
2024-02-29 11:55:21 -08:00
Mengjiao Liu
b584b87a94 kube-controller-manager: readjust log verbosity
- Increase the global level for broadcaster's logging to 3 so that users can ignore event messages by lowering the logging level. It reduces information noise.
- Making sure the context is properly injected into the broadcaster, this will allow the -v flag value to be used also in that broadcaster, rather than the above global value.
- test: use cancellation from ktesting
- golangci-hints: checked error return value
2024-02-26 14:51:56 +08:00
fusida
9f6b48f1e7 fix node lifecycle controller panic when conditionType ready is nil 2024-02-26 11:26:45 +08:00
Gyuho Lee
bfc2562d6c
chores(controller/nodelifecycle): make node taint removal logs more accurate
Signed-off-by: Gyuho Lee <gyuho@lepton.ai>
2024-01-08 20:54:05 +08:00
Andrea Tosatto
ccda2d6fd4 kube-controller-manager: Decouple TaintManager from NodeLifeCycleController (KEP-3902) 2023-10-30 12:23:56 +00:00
Kubernetes Prow Robot
74098ab5ad
Merge pull request #119500 from JackTroy/fix-threshold-arg
Add explanation for large-cluster-size-threshold arg
2023-10-30 02:50:10 +01:00
Michal Wozniak
32fdb55192 Use Patch instead of SSA for Pod Disruption condition 2023-10-19 21:00:19 +02:00
Kubernetes Prow Robot
6cbc5dfac6
Merge pull request #114095 from aimuz/fix-114083
scheduler: Fix field apiVersion is missing from events reported from taint manager
2023-08-21 07:03:23 -07:00
jackcui
9d8959224c add explanation for large-cluster-size-threshold arg about multiple zones cluster 2023-07-21 17:25:51 +08:00
Ziqi Zhao
dfc1838379 Migrated pkg/controller/volume|util|replicaset|nodeipam to contextual logging
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-07-06 07:39:52 +08:00
aimuz
396c8a6783
test: TestPodDeletionEvent
Signed-off-by: aimuz <mr.imuz@gmail.com>
2023-06-16 10:31:33 +08:00
aimuz
975b2c6611
scheduler: Fix field apiVersion is missing from events reported from taint manager
Signed-off-by: aimuz <mr.imuz@gmail.com>
2023-06-16 09:51:25 +08:00
Kubernetes Prow Robot
484645e817
Merge pull request #116659 from claudiubelu/skip-flaky-tests-2
unit tests: Skip flaky tests on Windows (part 2)
2023-05-23 20:04:48 -07:00
Kubernetes Prow Robot
29fe2c70b1
Merge pull request #117252 from alculquicondor/node-lifecycle-owner
Add SIG ownership to controller/nodelifecycle
2023-04-17 14:10:57 -07:00
Claudiu Belu
0979d55443 unit tests: Skip flaky tests on Windows (part 2)
Some of the unit tests are currently flaky on Windows. This commit
skips them until they are resolved.
2023-04-13 12:07:18 +00:00
Aldo Culquicondor
b23ab389b4
Add SIG ownership to controller/nodelifecycle
Change-Id: I31a329d9ca08bdf12a428cae44a5f061afa01e73
2023-04-12 15:42:06 -04:00
Tim Hockin
29c0b73d64
Replace uses of diff.ObjectDiff with cmp.Diff
ObjectDiff is already a shim over cmp.Diff, so no actual output or
behavior changes
2023-04-12 08:46:12 -07:00
Andrea Tosatto
d09842e0ad
node-lifecycle-controller: improve monitorNodeHealth test-coverage (#116687)
* node-lifecycle-controller: refactor monitorNodeHealth tests to improve test-coverage

* address PR review comments

* dedupe test logic
2023-04-12 07:02:33 -07:00
Kubernetes Prow Robot
27e23bad7d
Merge pull request #116529 from pohly/controllers-with-name
kube-controller-manager: convert to structured logging
2023-03-14 14:12:55 -07:00
Ziqi Zhao
d1aa73312c
pkg/controller/util support contextual logging (#115049)
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-03-14 12:38:14 -07:00
Patrick Ohly
99151c39b7 kube-controller-manager: convert to structured logging
Most of the individual controllers were already converted earlier. Some log
calls were missed or added and then not updated during a rebase. Some of those
get updated here to fill those gaps.

Adding of the name to the logger used by each controller gets
consolidated in this commit. By using the name under which the
controller is registered we ensure that the names in the log
are consistent.
2023-03-14 19:16:32 +01:00
Damien Grisonnet
ac394c5c19 Cleanup deprecated metrics
Remove the following deprecated metrics:
- node_collector_evictions_number
- scheduler_e2e_scheduling_duration_seconds

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2023-03-13 22:55:34 +01:00
Andrea Tosatto
cae19f9e85 Remove deprecated pod-eviction-timeout flag from controller-manager 2023-03-07 18:14:18 +00:00
kerthcet
e5c812bbe7 Remove CLI flag enable-taint-manager
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-03-07 18:11:49 +00:00
Kubernetes Prow Robot
37326f7cea
Merge pull request #112670 from yangjunmyfm192085/delklogV0
use contextual logging(nodeipam and nodelifecycle part)
2023-03-07 09:40:33 -08:00
JunYang
780ef3afb0 use klog.InfoS instead of klog.V(0),Info 2023-03-07 15:50:01 +08:00
Claudiu Belu
5ba74c81ca unit tests: Skip flaky tests on Windows
Some of the unit tests are currently flaky on Windows. This commit
skips them until they are resolved.
2023-03-06 20:46:05 +00:00
binacs
84ff621309 cleanup(controller): use IsSuperset to avoid interim slice 2023-02-19 21:49:58 +08:00
JunYang
29086e2b04 use klog instead of klog.V(0) 2023-01-14 21:15:50 +08:00