Commit graph

47106 commits

Author SHA1 Message Date
Miciah Masters
fc18ffe58d TopologyAwareHints: Take lock in HasPopulatedHints
Prevent potential concurrent map access by taking a lock before reading the
topology cache's hintsPopulatedByService map.

* staging/src/k8s.io/endpointslice/topologycache/topologycache.go
(setHintsLocked, hasPopulatedHintsLocked): New helper functions.  These are
the same as the existing SetHints and HasPopulatedHints methods except that
these helpers assume that a lock is already held.
(SetHints): Use setHintsLocked.
(HasPopulatedHints): Take a lock and use hasPopulatedHintsLocked.
(AddHints): Take a lock and use setHintsLocked and hasPopulatedHintsLocked.
* staging/src/k8s.io/endpointslice/topologycache/topologycache_test.go
(TestTopologyCacheRace): Add a goroutine that calls HasPopulatedHints.
2023-08-31 16:38:51 -04:00
Kubernetes Prow Robot
32842f1d00
Merge pull request #120054 from ruiwen-zhao/automated-cherry-pick-of-#119986-upstream-release-1.27
Automated cherry pick of #119986: Pass Pinned field to kubecontainer.Image
2023-08-30 08:44:47 -07:00
Kubernetes Prow Robot
4737b87479
Merge pull request #119432 from ffromani/automated-cherry-pick-of-#118635-upstream-release-1.27-1689697264
[1.27]  kubelet: devices: skip allocation for running pods #118635
2023-08-30 07:42:47 -07:00
Michal Wozniak
3449ab21b4 Mark Job onPodConditions as optional in pod failure policy 2023-08-28 17:19:14 +02:00
Kubernetes Prow Robot
38c97fa67e
Merge pull request #120135 from ritazh/cherry-pick-cve-2023-3955-1.27
Cherry pick of #120128 Use environment variables for parameters in Powershell
2023-08-23 12:35:57 -07:00
James Sturtevant
acc29048e6
Use environment varaibles for parameters in Powershell
As a defense in depth, pass parameters to powershell via environment variables.

Signed-off-by: James Sturtevant <jstur@microsoft.com>
2023-08-23 07:00:53 -07:00
James Sturtevant
172644fb55
Use env varaibles for passing path
The subpath could be passed a powershell subexpression which would be executed by kubelet with privilege.  Switching to pass the arguments via environment variables means the subexpression won't be evaluated.

Signed-off-by: James Sturtevant <jstur@microsoft.com>
2023-08-23 06:39:13 -07:00
ruiwen-zhao
78e14a2d80 Pass Pinned field to kubecontainer.Image
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2023-08-18 17:38:43 +00:00
Jordan Liggitt
3b6bcaa0b9
Avoid returning nil responseKind in v1beta1 aggregated discovery 2023-08-09 14:43:56 -04:00
Francesco Romani
e5512149e2 node: devicemgr: topomgr: add logs
One of the contributing factors of issues #118559 and #109595 hard to
debug and fix is that the devicemanager has very few logs in important
flow, so it's unnecessarily hard to reconstruct the state from logs.

We add minimal logs to be able to improve troubleshooting.
We add minimal logs to be backport-friendly, deferring a more
comprehensive review of logging to later PRs.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-08-08 13:36:40 +02:00
Francesco Romani
34f2a5803a kubelet: devices: skip allocation for running pods
When kubelet initializes, runs admission for pods and possibly
allocated requested resources. We need to distinguish between
node reboot (no containers running) versus kubelet restart (containers
potentially running).

Running pods should always survive kubelet restart.
This means that device allocation on admission should not be attempted,
because if a container requires devices and is still running when kubelet
is restarting, that container already has devices allocated and working.

Thus, we need to properly detect this scenario in the allocation step
and handle it explicitely. We need to inform
the devicemanager about which pods are already running.

Note that if container runtime is down when kubelet restarts, the
approach implemented here won't work. In this scenario, so on kubelet
restart containers will again fail admission, hitting
https://github.com/kubernetes/kubernetes/issues/118559 again.
This scenario should however be pretty rare.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-08-08 13:36:13 +02:00
Kubernetes Prow Robot
de56018f04
Merge pull request #117269 from tnqn/automated-cherry-pick-of-#117245-#117249-upstream-release-1.27
Automated cherry pick of #117245: Fix TopologyAwareHint not working when zone label is added
#117249: Fix a data race in TopologyCache
2023-08-04 13:26:31 -07:00
Kubernetes Prow Robot
521580378a
Merge pull request #119363 from jsafrane/automated-cherry-pick-of-#117804-upstream-release-1.27
Automated cherry pick of #117804: Refactor FindAttachablePluginBySpec out of CSI code path
2023-08-04 11:58:08 -07:00
Kubernetes Prow Robot
2ac615ccde
Merge pull request #117235 from cvvz/automated-cherry-pick-of-#116134-origin-release-1.27
Automated cherry pick of #116134: fix: After a Node is down and take some time to get back to up again, the mount point of the evicted Pods cannot be cleaned up successfully.
2023-08-02 05:32:44 -07:00
Kubernetes Prow Robot
559f43d49c
Merge pull request #119466 from mimowo/automated-cherry-pick-of-#119434-upstream-release-1.27
Automated cherry pick of #119434: Include ignored pods when computing backoff delay for Job pod
2023-08-02 04:36:54 -07:00
Amir Alavi
db832fdfa6
fix 'pod' in kubelet prober metrics 2023-07-26 21:44:00 -04:00
Michal Wozniak
ed0cdc9e0b Include ignored pods when computing backoff delay for Job pod failures
# Conflicts:
#	pkg/controller/job/job_controller.go
2023-07-21 09:31:49 +02:00
Michal Wozniak
ae24a5cf74 Remarks 2023-07-21 09:29:47 +02:00
Michal Wozniak
9e1050b4d9 Adjust the algorithm for computing the pod finish time
Change-Id: Ic282a57169cab8dc498574f08b081914218a1039
2023-07-20 16:29:26 +02:00
Jan Safranek
aefc4d0392 Rename updateReconstructedFromAPIServer
to be in sync with volumesNeedUpdateFromNodeStatus.
2023-07-17 11:18:48 +02:00
Jan Safranek
eeba02fc62 Rename volumesNeedDevicePath
To volumesNeedUpdateFromNodeStatus - because both devicePath and uncertain
attach-ability needs to be fixed from node status.
2023-07-17 11:18:48 +02:00
Jan Safranek
5eb3b748e8 Update volumesInUse after attachability is confirmed
node.status.volumesInUse should report only attachable volumes, therefore
it needs to wait for the reconciler to update uncertain attachability of
volumes from the API server.
2023-07-17 11:18:48 +02:00
Jan Safranek
f8bb161ab5 Add uncertain state of volume attach-ability
During CSI volume reconstruction it's not possible to tell, if the volume
is attachable or not - CSIDriver instance may not be available, because
kubelet may not have connection to the API server at that time.

Adding uncertain state during reconstruction + adding a correct state when
the API server is available.
2023-07-17 11:18:48 +02:00
Jan Safranek
08b7937d25 Refactor FindAttachablePluginBySpec out of CSI code path
reconstructVolume() is called when kubelet may not have connection to the
API server yet, therefore it cannot get CSIDriver instances to figure out
if a CSI volume is attachable or not.

Refactor reconstructVolume(), so it does not need
FindAttachablePluginBySpec for CSI volumes, because all of them are
deviceMountable (i.e. FindDeviceMountablePluginBySpec always returns the
CSI volume plugin).
2023-07-17 11:18:48 +02:00
Kubernetes Prow Robot
5ee5d7346e
Merge pull request #119096 from aleksandra-malinowska/automated-cherry-pick-of-#117865-upstream-release-1.27
Automated cherry pick of #117865: Parallel StatefulSet pod create & delete
2023-07-12 16:31:33 -07:00
Kubernetes Prow Robot
b5c876a05b
Merge pull request #117226 from princepereira/automated-cherry-pick-of-#116749-upstream-release-1.27
Automated cherry pick of #116749: Adding additional validations to queried endpoint
2023-07-11 22:53:13 -07:00
Aleksandra Malinowska
28c79be674 Add unit tests for parallel StatefulSet create & delete 2023-07-10 12:31:07 +02:00
Aleksandra Malinowska
66f980be12 Parallel StatefulSet pod create & delete 2023-07-10 12:31:07 +02:00
Aleksandra Malinowska
288504fbf8 Refactor StatefulSet controller update logic 2023-07-10 12:31:07 +02:00
Aldo Culquicondor
92a0f58e2b
Only declare job as finished after removing all finalizers
Change-Id: Id4b01b0e6fabe24134e57e687356e0fc613cead4
2023-07-07 14:31:02 -04:00
Aldo Culquicondor
c655001fa4
Automated cherry pick of #118716 upstream release 1.27 (#118911)
* Skip terminal Pods with a deletion timestamp from the Daemonset sync

Change-Id: I64a347a87c02ee2bd48be10e6fff380c8c81f742

* Review comments and fix integration test

Change-Id: I3eb5ec62bce8b4b150726a1e9b2b517c4e993713

* Include deleted terminal pods in history

Change-Id: I8b921157e6be1c809dd59f8035ec259ea4d96301

* Exclude terminal pods from Daemonset e2e tests

Change-Id: Ic29ca1739ebdc54822d1751fcd56a99c628021c4
2023-07-06 18:57:02 -07:00
Kubernetes Prow Robot
b667da8e08
Merge pull request #118683 from serathius/automated-cherry-pick-of-#118460-origin-release-1.27
Automated cherry pick of #118460: Make etcd component status consistent with health probes
2023-07-06 18:01:03 -07:00
Kubernetes Prow Robot
f8c1cc33cb
Merge pull request #119139 from kmala/1.27
Update schedule logic to properly calculate missed schedules
2023-07-06 16:55:03 -07:00
Kubernetes Prow Robot
5bbacb1198
Merge pull request #118290 from HirazawaUi/automated-cherry-pick-of-#118177-upstream-release-1.27
Automated cherry pick of #118177: Fix the git-repo test error caused by the correct use of loop
2023-07-06 10:31:03 -07:00
Maciej Szulik
b383755e46 Hide numberOfMissedSchedules as an algorithm internal number 2023-07-06 10:21:55 -07:00
Maciej Szulik
26db84e04c Update schedule logic to properly calculate missed schedules
Before this change we've assumed a constant time between schedule runs,
which is not true for cases like "30 6-16/4 * * 1-5".
The fix is to calculate the potential next run using the fixed schedule
as the baseline, and then go back one schedule back and allow the cron
library to calculate the correct time.

This approach saves us from iterating multiple times between last
schedule time and now, if the cronjob for any reason wasn't running for
significant amount of time.
2023-07-06 10:21:43 -07:00
aleskandro
703edddae4 Updating the nodeAffinity of gated pods having nil affinity should be allowed 2023-07-02 11:17:05 +02:00
Kubernetes Prow Robot
3b874af387
Merge pull request #118662 from mkowalski/automated-cherry-pick-of-#118329-upstream-release-1.27
Automated cherry pick of #118329: Set the node-ips annotation correctly with
2023-06-30 02:25:45 -07:00
Kubernetes Prow Robot
d936e6669b
Merge pull request #118841 from bobbypage/automated-cherry-pick-of-#118497-upstream-release-1.27
Automated cherry pick of #118497: Fix the deletion of rejected pods
2023-06-29 06:09:36 -07:00
Kubernetes Prow Robot
76b9400cea
Merge pull request #118283 from pohly/automated-cherry-pick-of-#118257-origin-release-1.27
Automated cherry pick of #118257: dra scheduler plugin test: fix loopvar bug and "reserve"
2023-06-29 03:03:37 -07:00
Michal Wozniak
5423fffca9 Review remarks to improve HandlePodCleanups in kubelet 2023-06-23 14:37:24 -07:00
Michal Wozniak
24c67c1524 Fix the deletion of rejected pods 2023-06-23 14:37:24 -07:00
Heba Elayoty
62cf5ee1cd
Unset gated pod info timestamp in addToActiveQ
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
2023-06-22 15:34:22 -07:00
Marek Siarkowicz
ea2af58b5b Make etcd component status consistent with health probes
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
2023-06-15 11:48:03 +02:00
Dan Winship
eb5825b3a3
Set the node-ips annotation correctly with CloudDualStackNodeIPs 2023-06-14 14:30:19 +02:00
Antonio Ojea
b30e94b125 kube-proxy avoid race condition using LocalModeNodeCIDR
Since kube-proxy in LocalModeNodeCIDR needs to obtain the PodCIDR
assigned to the node it watches for the Node object.

However, kube-proxy startup process requires to have these watches in
different places, that opens the possibility of having a race condition
if the same node is recreated and a different PodCIDR is assigned.

Initializing the second watch with the value obtained in the first one
allows us to detect this situation.

Change-Id: I6adeedb6914ad2afd3e0694dcab619c2a66135f8
Signed-off-by: Antonio Ojea <aojea@google.com>
2023-06-06 21:10:53 +00:00
HirazawaUi
22e8a99ec6 Fix the git-repo test error caused by the correct use of loop variables 2023-05-27 01:01:01 +08:00
Patrick Ohly
009a7a6fb9 dra scheduler plugin test: fix loopvar bug and "reserve" expected data
The `listAll` function returned a slice where all pointers referred to the same
instance. That instance had the value of the last list entry. As a result, unit
tests only compared that element.

During the reserve phase, the first claim gets reserved in two test
cases. Those two tests must expect that change. That hadn't been noticed before
because that first claim didn't get compared.
2023-05-26 17:18:07 +02:00
Michal Wozniak
e407c2b4b0 Add DisruptionTarget condition when preempting for critical pod 2023-05-24 09:14:44 +02:00
Kubernetes Prow Robot
4c39cdc418
Merge pull request #117815 from kerthcet/automated-cherry-pick-of-#117802-upstream-release-1.27
Automated cherry pick of #117802: Update podFailurePolicy comments from alpha-level to beta
2023-05-11 11:53:12 -07:00