diff --git a/docs/documents/apis.md b/docs/documents/apis.md index 1ed350d9e..8aa70fc4b 100644 --- a/docs/documents/apis.md +++ b/docs/documents/apis.md @@ -513,6 +513,21 @@ not be estimated during the time a MachineDeployment is paused. This is not set by default, which is treated as infinite deadline.

+ + +autoPreserveFailedMachineMax + + + +int32 + + + +(Optional) +

The maximum number of failed machines in the machine deployment that can be auto-preserved. +In the gardener context, this number is derived from the AutoPreserveFailedMachineMax set at the worker level, distributed amongst the worker’s machine deployments

+ + @@ -678,6 +693,19 @@ int32 (Optional) + + +autoPreserveFailedMachineMax + + + +int32 + + + +(Optional) + + @@ -833,6 +861,21 @@ Kubernetes meta/v1.Time

Last update time of current status

+ + +preserveExpiryTime + + + + +Kubernetes meta/v1.Time + + + + +

PreserveExpiryTime is the time at which MCM will stop preserving the machine

+ +
@@ -1071,6 +1114,22 @@ Kubernetes meta/v1.Duration +machinePreserveTimeout + + + + +Kubernetes meta/v1.Duration + + + + +(Optional) +

MachinePreserveTimeout is the timeout after which the machine preservation is stopped

+ + + + disableHealthTimeout @@ -1398,6 +1457,21 @@ not be estimated during the time a MachineDeployment is paused. This is not set by default, which is treated as infinite deadline.

+ + +autoPreserveFailedMachineMax + + + +int32 + + + +(Optional) +

The maximum number of failed machines in the machine deployment that can be auto-preserved. +In the gardener context, this number is derived from the AutoPreserveFailedMachineMax set at the worker level, distributed amongst the worker’s machine deployments

+ +
@@ -1860,6 +1934,19 @@ int32 (Optional) + + +autoPreserveFailedMachineMax + + + +int32 + + + +(Optional) + +
@@ -1998,6 +2085,20 @@ LastOperation

FailedMachines has summary of machines on which lastOperation Failed

+ + +autoPreserveFailedMachineCount + + + +int32 + + + +(Optional) +

AutoPreserveFailedMachineCount has a count of the number of failed machines in the machineset that are currently auto-preserved

+ +
diff --git a/docs/proposals/machine-preservation.md b/docs/proposals/machine-preservation.md index bf92d3c0b..111b7ed64 100644 --- a/docs/proposals/machine-preservation.md +++ b/docs/proposals/machine-preservation.md @@ -29,17 +29,18 @@ Related Issue: https://github.com/gardener/machine-controller-manager/issues/100 ## Proposal In order to achieve the objectives mentioned, the following are proposed: -1. Enhance `machineControllerManager` configuration in the `ShootSpec`, to specify the max number of machines to be auto-preserved, -and the time duration for which these machines will be preserved. - ``` - machineControllerManager: - autoPreserveFailedMax: 0 - machinePreserveTimeout: 72h - ``` - * This configuration will be set per worker pool. - * Since gardener worker pool can correspond to `1..N` MachineDeployments depending on number of zones, `autoPreserveFailedMax` will be distributed across N machine deployments. - * `autoPreserveFailedMax` must be chosen such that it can be appropriately distributed across the MachineDeployments. - * Example: if `autoPreserveFailedMax` is set to 2, and the worker pool has 2 zones, then the maximum number of machines that will be preserved per zone is 1. +1. Enhance `worker` configuration in the `ShootSpec`, to specify the maximum number of failed machines that will be auto-preserved and the time duration for which machines will be preserved. +``` + workers: + - name: example-worker + autoPreserveFailedMachineMax: 2 + machineControllerManager: + machinePreserveTimeout: 72h +``` + * This configuration will be set per worker pool. + * Since gardener worker pool can correspond to `1..N` MachineDeployments depending on number of zones, `autoPreserveFailedMachineMax` will be distributed across N machine deployments. + * `autoPreserveFailedMachineMax` must be chosen such that it can be appropriately distributed across the MachineDeployments. + * Example: if `autoPreserveFailedMachineMax` is set to 2, and the worker pool has 2 zones, then the maximum number of machines that will be preserved per zone is 1. 2. MCM will be modified to include a new sub-phase `Preserved` to indicate that the machine has been preserved by MCM. 3. Allow user/operator to request for preservation of a specific machine/node with the use of annotations : `node.machine.sapcloud.io/preserve=now` and `node.machine.sapcloud.io/preserve=when-failed`. 4. When annotation `node.machine.sapcloud.io/preserve=now` is added to a `Running` machine, the following will take place: @@ -49,29 +50,28 @@ and the time duration for which these machines will be preserved. - After timeout, the `node.machine.sapcloud.io/preserve=now` and `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` are deleted. The `machine.CurrentStatus.PreserveExpiryTime` is set to `nil`. The machine phase is changed to `Running` and the CA may delete the node. - If a machine in `Running:Preserved` fails, it is moved to `Failed:Preserved`. 5. When annotation `node.machine.sapcloud.io/preserve=when-failed` is added to a `Running` machine and the machine goes to `Failed`, the following will take place: - - The machine is drained of pods except for Daemonset pods. + - Pods (other than DaemonSet pods) are drained. - The machine phase is changed to `Failed:Preserved`. - `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` is added to the node to prevent CA from scaling it down. - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$. - After timeout, the annotations `node.machine.sapcloud.io/preserve=when-failed` and `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` are deleted. `machine.CurrentStatus.PreserveExpiryTime` is set to `nil`. The phase is changed to `Terminating`. -6. When an un-annotated machine goes to `Failed` phase and `autoPreserveFailedMax` is not breached: +6. When an un-annotated machine goes to `Failed` phase and `autoPreserveFailedMachineMax` is not breached: - Pods (other than DaemonSet pods) are drained. - The machine's phase is changed to `Failed:Preserved`. - `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` is added to the node to prevent CA from scaling it down. - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$. - After timeout, the annotation `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` is deleted. `machine.CurrentStatus.PreserveExpiryTime` is set to `nil`. The phase is changed to `Terminating`. - - Number of machines in `Failed:Preserved` phase count towards enforcing `autoPreserveFailedMax`. -7. If a failed machine is currently in `Failed:Preserved` and before timeout its VM/node is found to be Healthy, the machine will be moved to `Running:Preserved`. After the timeout, it will be moved to `Running`. -The rationale behind moving the machine to `Running:Preserved` rather than `Running`, is to allow pods to get scheduled on to the healthy node again without the autoscaler scaling it down due to under-utilization. -8. A user/operator can request MCM to stop preserving a machine/node in `Running:Preserved` or `Failed:Preserved` phase using the annotation: `node.machine.sapcloud.io/preserve=false`. + - Number of machines in `Failed:Preserved` phase count towards enforcing `autoPreserveFailedMachineMax`. +. +7. A user/operator can request MCM to stop preserving a machine/node in `Running:Preserved` or `Failed:Preserved` phase by deleting the annotation: `node.machine.sapcloud.io/preserve`. * MCM will move a machine thus annotated either to `Running` phase or `Terminating` depending on the phase of the machine before it was preserved. -9. Machines of a MachineDeployment in `Preserved` sub-phase will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment. -10. MCM will be modified to perform drain in `Failed` phase rather than `Terminating`. +8. Machines of a MachineDeployment in `Preserved` sub-phase will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment. +9. MCM will be modified to perform drain in `Failed` phase for preserved machines. ## State Diagrams: 1. State Diagram for when a machine or its node is explicitly annotated for preservation: - ```mermaid +```mermaid stateDiagram-v2 state "Running" as R state "Running + Requested" as RR @@ -86,27 +86,26 @@ The rationale behind moving the machine to `Running:Preserved` rather than `Runn RR --> F: on failure F --> FP FP --> T: on timeout or preserve=false - FP --> RP: if node Healthy before timeout + FP --> R: if node Healthy before timeout T --> [*] R-->RP: annotated with preserve=now RP-->F: if node/VM not healthy - ``` +``` + 2. State Diagram for when an un-annotated `Running` machine fails (Auto-preservation): ```mermaid stateDiagram-v2 state "Running" as R - state "Running:Preserved" as RP state "Failed (node drained)" as F state "Failed:Preserved" as FP state "Terminating" as T [*] --> R R-->F: on failure - F --> FP: if autoPreserveFailedMax not breached - F --> T: if autoPreserveFailedMax breached + F --> FP: if autoPreserveFailedMachineMax not breached + F --> T: if autoPreserveFailedMachineMax breached FP --> T: on timeout or value=false - FP --> RP : if node Healthy before timeout - RP --> R: on timeout or preserve=false + FP --> R : if node Healthy before timeout T --> [*] ``` @@ -128,21 +127,22 @@ The rationale behind moving the machine to `Running:Preserved` rather than `Runn 4. Operator analyzes the VM. -### Use Case 3: Auto-Preservation +### Use Case 3: Auto-Preservation of Failed Machine aiding in Failure Analysis and Recovery **Scenario:** Machine fails unexpectedly, no prior annotation. #### Steps: 1. Machine transitions to `Failed` phase. 2. Machine is drained. -3. If `autoPreserveFailedMax` is not breached, machine moved to `Failed:Preserved` phase by MCM. +3. If `autoPreserveFailedMachineMax` is not breached, machine moved to `Failed:Preserved` phase by MCM. 4. After `machinePreserveTimeout`, machine is terminated by MCM. +5. If machine is brought back to `Running` phase before timeout, pods can be scheduled on it again. ### Use Case 4: Early Release **Scenario:** Operator has performed his analysis and no longer requires machine to be preserved. #### Steps: 1. Machine is in `Running:Preserved` or `Failed:Preserved` phase. -2. Operator adds: `node.machine.sapcloud.io/preserve=false` to node. +2. Operator removes `node.machine.sapcloud.io/preserve` from node. 3. MCM transitions machine to `Running` or `Terminating` for `Running:Preserved` or `Failed:Preserved` respectively, even though `machinePreserveTimeout` has not expired. -4. If machine was in `Failed:Preserved`, capacity becomes available for auto-preservation. +4. If machine was auto-preserved, capacity becomes available for auto-preservation. ## Points to Note @@ -151,13 +151,12 @@ The rationale behind moving the machine to `Running:Preserved` rather than `Runn 3. Consumers (with access to shoot cluster) can annotate Nodes they would like to preserve. 4. Operators (with access to control plane) can additionally annotate Machines that they would like to preserve. This feature can be used when a Machine does not have a backing Node and the operator wishes to preserve the backing VM. 5. If the backing Node object exists but does not have the preservation annotation, preservation annotations added on the Machine will be honoured. -6. However, if a backing Node exists for a Machine and has the preservation annotation, the Node's annotation value will override the Machine annotation value, and be synced to the Machine object. -7. If `autoPreserveFailedMax` is reduced in the Shoot Spec, older machines are moved to `Terminating` phase before newer ones. +6. However, if a backing Node exists for a Machine and has the preservation annotation, the Node's annotation value will override the Machine annotation value. +7. If `autoPreserveFailedMachineMax` is reduced in the Shoot Spec, older machines are moved to `Terminating` phase before newer ones. 8. In case of a scale down of an MCD's replica count, `Preserved` machines will be the last to be scaled down. Replica count will always be honoured. -9. If the value for annotation key `cluster-autoscaler.kubernetes.io/scale-down-disabled` for a machine in `Running:Preserved` is changed to `false` by a user, the value will be overwritten to `true` by MCM. -10. On increase/decrease of timeout, the new value will only apply to machines that go into `Preserved` phase after the change. Operators can always edit `machine.CurrentStatus.PreserveExpiryTime` to prolong the expiry time of existing `Preserved` machines. -11. [Modify CA FAQ](https://github.com/gardener/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-prevent-cluster-autoscaler-from-scaling-down-a-particular-node) once feature is developed to use `node.machine.sapcloud.io/preserve=now` instead of the `cluster-autoscaler.kubernetes.io/scale-down-disabled=true` currently suggested. This would: - - harmonise machine flow - - shield from CA's internals - - make it generic and no longer CA specific - - allow a timeout to be specified \ No newline at end of file +9. On increase/decrease of `machinePreserveTimeout`, the new value will only apply to machines that go into `Preserved` phase after the change. Operators can always edit `machine.CurrentStatus.PreserveExpiryTime` to prolong the expiry time of existing `Preserved` machines. +10. [Modify CA FAQ](https://github.com/gardener/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-prevent-cluster-autoscaler-from-scaling-down-a-particular-node) once feature is developed to use `node.machine.sapcloud.io/preserve=now` instead of the `cluster-autoscaler.kubernetes.io/scale-down-disabled=true` currently suggested. This would: + - harmonise machine flow + - shield from CA's internals + - make it generic and no longer CA specific + - allow a timeout to be specified \ No newline at end of file diff --git a/docs/usage/machine-preservation.md b/docs/usage/machine-preservation.md new file mode 100644 index 000000000..01301e9bf --- /dev/null +++ b/docs/usage/machine-preservation.md @@ -0,0 +1,153 @@ +# Machine Preservation — Usage Guide + +This document explains how to **use machine preservation** to retain machines and their backing VMs. + +### What is preservation in MCM? + +A machine and its backing node can be preserved by an end-user/SRE/operator to retain machines and their backing VMs for debugging, analysis, or operational safety. + +A preserved machine/node has the following properties: +- When a machine in `Failed` phase is preserved, it continues to stay in `Failed` state until `machinePreserveTimeout` runs out, without getting terminated. This allows end-users and SREs to debug the machine and backing node, and take necessary actions to recover the machine if needed. +- If a machine is in its `Failed` phase and is preserved, on recovering from failure, the machine can be moved to `Running` phase and the backing node can be uncordoned to allow scheduling of pods again. +- If a machine is preserved and is in its `Failed` phase, MCM drains the backing node of all pods, but the daemonset pods remain on the node. +- If a machine is preserved in its `Running` phase, the MCM adds the CA scale-down-disabled annotation to prevent the CA from scaling down the machine in case of underutilization. +- When the machineset is scaled down, machines in the machineset marked for preservation are de-prioritized for deletion. + + +> Note: If a user sets a deletion timestamp on the machine/node (e.g., by using kubectl delete machine/node), the machine and backing node will be deleted. Preservation will not prevent this. + +> Note: If the desired replica count for a machineset cannot be met without scaling down preserved machines, the required number of preserved machines will be scaled-down. + +### Changes in a machine/node object on preservation: + +- If the machine is preserved in `Running` phase: + - The `PreserveExpiryTime` is set in the machine's status to indicate when preservation will end. + - The CA scale-down-disabled annotation is added. + - The `NodeCondition` of `Type=Preserved` is updated to show that the preservation was successful. + - If a machine has no backing node, only `PreserveExpiryTime` is set. +- If the machine is preserved and in `Failed` phase: + - The `PreserveExpiryTime` is set in the machine's status to indicate when preservation will end. + - The CA scale-down-disabled annotation is added on the backing node, + - The backing node is cordoned to prevent scheduling of new pods on it. + - The backing node is drained of all pods but the daemonset pods remain. + - The `NodeCondition` of `Type=Preserved` is updated to show that the preservation was successful. + - If a machine has no backing node, only `PreserveExpiryTime` is set. + +### Changes in a machine/node object when preservation stops: +- If the machine is in `Running` phase: + - The `PreserveExpiryTime` is cleared. + - The CA scale-down-disabled annotation is removed. + - The `NodeCondition` of `Type=Preserved` is updated to show that the preservation has stopped. + - If a machine has no backing node, only `PreserveExpiryTime` is cleared. +- If the machine is in `Failed` phase: + - The `PreserveExpiryTime` is cleared. + - The CA scale-down-disabled annotation is removed. + - The `NodeCondition` of `Type=Preserved` is updated to show that the preservation has stopped. + - If a machine has no backing node, only `PreserveExpiryTime` is cleared. + - The machine is moved to `Terminating` phase by MCM. + +--- +### How can a machine be preserved? + +- The preservation feature offers two modes of preservation: + - Manual preservation by adding annotations + - Auto-preservation by MCM by specifying `AutoPreserveFailedMachineMax` for a workerpool. This value is distributed evenly across zones (MCD). +- For manual preservation, the end-user and operators must annotate either the node or the machine objects with annotation key: `node.machine.sapcloud.io/preserve` and the desired value, as described in the table below. +- If there is no backing node, the machine object can still be annotated. + +#### Configuration (Shoot Spec) + +Preservation is enabled and controlled per **worker pool**: + +```yaml +apiVersion: core.gardener.cloud/v1beta1 +kind: Shoot +... +spec: + workers: + - cri: + name: containerd + name: worker1 + autoPreserveFailedMachineMax: 1 + machineControllerManager: + machinePreserveTimeout: 72h +``` + +#### Configuration Semantics +- `AutoPreserveFailedMachineMax` : Maximum number of failed machines that can be auto-preserved concurrently in a worker pool. This value is distributed across machineDeployments (zones) in the worker pool. If the limit is reached, additional failed machines will not be preserved and will proceed to termination as usual. +- `machinePreserveTimeout` : Duration after which preserved machines are automatically released + +> Note: ⚠️ Changes to `machinePreserveTimeout` apply only to preservation done after the change. +> If `AutoPreserveFailedMachineMax` is decreased, preservation is stopped for older auto-preserved machines(earlier creationTimestamp) until the number of preserved machines is within the new limit. +> If the number of failed machines exceeds the `AutoPreserveFailedMachineMax` limit at any given time, machines with more recent creationTimestamp are auto-preserved first. + +### Preservation annotations + +annotation key: `node.machine.sapcloud.io/preserve` + +**Manual Annotation values:** + +| Annotation value | Purpose | +| ---------------- |-----------------------------------------------------------------------------| +| when-failed | To be added when the machine/node needs to be preserved **only on failure** | +| now | To be added when the machine/node needs to be preserved **now** | +| false | To be added if a machine should not be auto-preserved by MCM | + +**Auto-preservation Annotation values added by MCM:** + +| Annotation value | Purpose | +| ---------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------| +| auto-preserve | Added by MCM to indicate that a machine has been **auto-preserved** on failure. This machine will be counted towards **AutoPreserveFailedMachineMax** | + +### ⚠️ Preservation Annotation semantics: + +Both node and machine objects can be annotated for preservation. +However, if both machine and node have the preservation annotation, the node's annotation value (even if set to "") is honoured and the machine's annotation is deleted. +To prevent confusion and unintended behaviour, it is recommended to use preservation by annotating the node object, if it exists and can be accessed. +When the `PreserveExpiryTime` of a preserved machine is reached, the preservation will be stopped. Additionally, the preservation annotation is removed to prevent undesired re-preservation of the same machine. This is applicable for both manual and auto-preservation. + +When a machine is annotated with value `false`, the annotation and value is not removed by MCM. This is to explicitly indicate that the machine should not be auto-preserved by MCM. If the annotation value is set to empty or the annotation is deleted, MCM will again auto-preserve the machine if it is in `Failed` phase and the `AutoPreserveFailedMachineMax` limit is not reached. + +--- +### How to manually stop preservation before PreserveExpiryTime: +**In the case of manual preservation:** +To manually stop preservation, the preservation annotation must be deleted from whichever object (node/machine) is annotated for preservation. + +**In the case of auto-preservation:** +To stop auto-preservation, the machine should be annotated with `node.machine.sapcloud.io/preserve=false`. +Deleting the annotation value or setting it to empty will not stop preservation since MCM will again auto-preserve the machine if it is in `Failed` phase and the `AutoPreserveFailedMachineMax` limit is not reached. +Setting the annotation value to `false` explicitly indicates that the machine should not be auto-preserved by MCM anymore, and the preservation will be stopped. + +--- + +### How to prevent a machine from being auto-preserved by MCM: + +To prevent a `Failed` machine from being auto-preserved the node/machine object must be annotated with `node.machine.sapcloud.io/preserve=false`. If a currently preserved machine is annotated with `false`, the preservation will be stopped. +Here too, the preservation annotation semantics from above applies - if both machine and node are annotated, the node's annotation value is honoured and the machine's annotation is deleted. + +If the preserve annotation value is set to empty or the annotation is deleted, MCM will again auto-preserve the machine if it is in `Failed` phase and the `AutoPreserveFailedMachineMax` limit is not reached. + +--- +### What happens when a machine recovers from failure and moves to `Running` during preservation? + +Depending on the annotation value - (`now/when-failed/auto-preserve`), the behaviour differs. This is to reflect the meaning behind the annotation value. + +1. `now`: on recovery from failure, machine preservation continues until `PreserveExpiryTime` +2. `when-failed`: on recovery from failure, machine preservation stops. This is because the annotation value clearly expresses that a machine must be preserved only when `Failed`. If the annotation is not changed, and the machine fails again, the machine is preserved again. +3. `auto-preserve`: since MCM performs auto-preservation of machines only when they are `Failed`, on recovery, the machine preservation is stopped. + +In all the cases, when the machine moves to `Running` during preservation, the backing node is uncordoned to allow pods to be scheduled on it again. + +>Note: When a machine recovers to `Running` and preservation is stopped, CA's `scale-down-unneeded-time` comes into play. If the node's utilization is below the utilization threshold configured even after `scale-down-unneeded-time`, CA will scale down the machine. + +--- +### Important Notes & Limitations +- Rolling updates: Preservation is ignored; Failed machines are replaced as usual. +- Shoot hibernation overrides preservation. +- Replica enforcement: Preserved machines count towards MachineDeployment replicas, and can be scaled down if the desired replica count is reduced below the number of preserved machines. +- Scale-down preference: Preserved machines are the last to be scaled down. +- Preservation status is visible via Node Conditions and Machine Status fields. +- machinePreserveTimeout changes do not affect existing preserved machines. Operators may edit PreserveExpiryTime directly if required to extend preservation. + + +> NOTE: To prevent confusion and unintended behaviour, it is recommended to use preservation by annotating the node object if it exists and can be accessed. diff --git a/kubernetes/crds/machine.sapcloud.io_machinedeployments.yaml b/kubernetes/crds/machine.sapcloud.io_machinedeployments.yaml index d836282a2..f55235fc0 100644 --- a/kubernetes/crds/machine.sapcloud.io_machinedeployments.yaml +++ b/kubernetes/crds/machine.sapcloud.io_machinedeployments.yaml @@ -67,6 +67,12 @@ spec: spec: description: Specification of the desired behavior of the MachineDeployment. properties: + autoPreserveFailedMachineMax: + description: |- + The maximum number of failed machines in the machine deployment that can be auto-preserved. + In the gardener context, this number is derived from the AutoPreserveFailedMachineMax set at the worker level, distributed amongst the worker's machine deployments + format: int32 + type: integer minReadySeconds: description: |- Minimum number of seconds for which a newly created machine should be ready @@ -296,6 +302,10 @@ spec: description: MachineInPlaceUpdateTimeout is the timeout after which in-place update is declared failed. type: string + machinePreserveTimeout: + description: MachinePreserveTimeout is the timeout after which + the machine preservation is stopped + type: string maxEvictRetries: description: MaxEvictRetries is the number of retries that will be attempted while draining the node. diff --git a/kubernetes/crds/machine.sapcloud.io_machines.yaml b/kubernetes/crds/machine.sapcloud.io_machines.yaml index 9378f0274..fcea16750 100644 --- a/kubernetes/crds/machine.sapcloud.io_machines.yaml +++ b/kubernetes/crds/machine.sapcloud.io_machines.yaml @@ -95,6 +95,10 @@ spec: description: MachineInPlaceUpdateTimeout is the timeout after which in-place update is declared failed. type: string + machinePreserveTimeout: + description: MachinePreserveTimeout is the timeout after which the + machine preservation is stopped + type: string maxEvictRetries: description: MaxEvictRetries is the number of retries that will be attempted while draining the node. @@ -287,6 +291,11 @@ spec: description: MachinePhase is a label for the condition of a machine at the current time. type: string + preserveExpiryTime: + description: PreserveExpiryTime is the time at which MCM will + stop preserving the machine + format: date-time + type: string timeoutActive: type: boolean type: object diff --git a/kubernetes/crds/machine.sapcloud.io_machinesets.yaml b/kubernetes/crds/machine.sapcloud.io_machinesets.yaml index c569dffe1..ba176b8df 100644 --- a/kubernetes/crds/machine.sapcloud.io_machinesets.yaml +++ b/kubernetes/crds/machine.sapcloud.io_machinesets.yaml @@ -60,6 +60,9 @@ spec: spec: description: MachineSetSpec is the specification of a MachineSet. properties: + autoPreserveFailedMachineMax: + format: int32 + type: integer machineClass: description: ClassSpec is the class specification of machine properties: @@ -178,6 +181,10 @@ spec: description: MachineInPlaceUpdateTimeout is the timeout after which in-place update is declared failed. type: string + machinePreserveTimeout: + description: MachinePreserveTimeout is the timeout after which + the machine preservation is stopped + type: string maxEvictRetries: description: MaxEvictRetries is the number of retries that will be attempted while draining the node. @@ -312,6 +319,11 @@ spec: description: MachineSetStatus holds the most recently observed status of MachineSet. properties: + autoPreserveFailedMachineCount: + description: AutoPreserveFailedMachineCount has a count of the number + of failed machines in the machineset that are currently auto-preserved + format: int32 + type: integer availableReplicas: description: The number of available replicas (ready for at least minReadySeconds) for this replica set. diff --git a/pkg/apis/machine/types.go b/pkg/apis/machine/types.go index 0facf09ba..6772c2100 100644 --- a/pkg/apis/machine/types.go +++ b/pkg/apis/machine/types.go @@ -97,6 +97,8 @@ type MachineConfiguration struct { // MachineInPlaceUpdateTimeout is the timeout after which in-place update is declared failed. MachineInPlaceUpdateTimeout *metav1.Duration + // MachinePreserveTimeout is the timeout after which the machine preservation is stopped + MachinePreserveTimeout *metav1.Duration // DisableHealthTimeout if set to true, health timeout will be ignored. Leading to machine never being declared failed. // This is intended to be used only for in-place updates. DisableHealthTimeout *bool @@ -158,6 +160,9 @@ type CurrentStatus struct { // Last update time of current status LastUpdateTime metav1.Time + + // PreserveExpiryTime is the time at which MCM will stop preserving the machine + PreserveExpiryTime *metav1.Time } // MachineStatus holds the most recently observed status of Machine. @@ -351,6 +356,8 @@ type MachineSetSpec struct { Template MachineTemplateSpec MinReadySeconds int32 + + AutoPreserveFailedMachineMax int32 } // MachineSetConditionType is the condition on machineset object @@ -409,6 +416,9 @@ type MachineSetStatus struct { // FailedMachines has summary of machines on which lastOperation Failed FailedMachines *[]MachineSummary + + // AutoPreserveFailedMachineCount has a count of the number of failed machines in the machineset that are currently auto-preserved + AutoPreserveFailedMachineCount int32 } // MachineSummary store the summary of machine. @@ -487,6 +497,10 @@ type MachineDeploymentSpec struct { // not be estimated during the time a MachineDeployment is paused. This is not set // by default. ProgressDeadlineSeconds *int32 + + // The maximum number of failed machines in the machine deployment that can be auto-preserved. + // In the gardener context, this number is derived from the AutoPreserveFailedMachineMax set at the worker level, distributed amongst the worker's machine deployments + AutoPreserveFailedMachineMax int32 } // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object diff --git a/pkg/apis/machine/v1alpha1/machine_types.go b/pkg/apis/machine/v1alpha1/machine_types.go index 19e017925..13db2a7d3 100644 --- a/pkg/apis/machine/v1alpha1/machine_types.go +++ b/pkg/apis/machine/v1alpha1/machine_types.go @@ -244,6 +244,26 @@ const ( UpdateFailed string = "UpdateFailed" ) +const ( + // NodePreserved is a node condition type for preservation of machines to allow end-user to know that a node is preserved + NodePreserved corev1.NodeConditionType = "Preserved" + + // PreservedByMCM is a node condition reason for preservation of machines to indicate that the node is auto-preserved by MCM + PreservedByMCM string = "Preserved by MCM." + + // PreservedByUser is a node condition reason to indicate that a machine/node has been preserved due to explicit annotation by user + PreservedByUser string = "Preserved by user." + + // PreservationStopped is a node condition reason to indicate that a machine/node preservation has been stopped due to annotation update or timeout + PreservationStopped string = "Preservation stopped." + + // PreservedNodeDrainSuccessful is a constant for the message in condition that indicates that the preserved node's drain is successful + PreservedNodeDrainSuccessful string = "Preserved node drained successfully." + + // PreservedNodeDrainUnsuccessful is a constant for the message in condition that indicates that the preserved node's drain was not successful + PreservedNodeDrainUnsuccessful string = "Preserved node could not be drained." +) + // CurrentStatus contains information about the current status of Machine. type CurrentStatus struct { Phase MachinePhase `json:"phase,omitempty"` @@ -252,6 +272,9 @@ type CurrentStatus struct { // Last update time of current status LastUpdateTime metav1.Time `json:"lastUpdateTime,omitempty"` + + // PreserveExpiryTime is the time at which MCM will stop preserving the machine + PreserveExpiryTime *metav1.Time `json:"preserveExpiryTime,omitempty"` } // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object diff --git a/pkg/apis/machine/v1alpha1/machinedeployment_types.go b/pkg/apis/machine/v1alpha1/machinedeployment_types.go index 1839ad866..2e1e6da0e 100644 --- a/pkg/apis/machine/v1alpha1/machinedeployment_types.go +++ b/pkg/apis/machine/v1alpha1/machinedeployment_types.go @@ -91,6 +91,11 @@ type MachineDeploymentSpec struct { // by default, which is treated as infinite deadline. // +optional ProgressDeadlineSeconds *int32 `json:"progressDeadlineSeconds,omitempty"` + + // The maximum number of failed machines in the machine deployment that can be auto-preserved. + // In the gardener context, this number is derived from the AutoPreserveFailedMachineMax set at the worker level, distributed amongst the worker's machine deployments + // +optional + AutoPreserveFailedMachineMax int32 `json:"autoPreserveFailedMachineMax,omitempty"` } const ( diff --git a/pkg/apis/machine/v1alpha1/machineset_types.go b/pkg/apis/machine/v1alpha1/machineset_types.go index 2e6eb1d6e..82fbb2dbf 100644 --- a/pkg/apis/machine/v1alpha1/machineset_types.go +++ b/pkg/apis/machine/v1alpha1/machineset_types.go @@ -68,6 +68,9 @@ type MachineSetSpec struct { // +optional MinReadySeconds int32 `json:"minReadySeconds,omitempty"` + + // +optional + AutoPreserveFailedMachineMax int32 `json:"autoPreserveFailedMachineMax,omitempty"` } // MachineSetConditionType is the condition on machineset object @@ -135,4 +138,8 @@ type MachineSetStatus struct { // FailedMachines has summary of machines on which lastOperation Failed // +optional FailedMachines *[]MachineSummary `json:"failedMachines,omitempty"` + + // AutoPreserveFailedMachineCount has a count of the number of failed machines in the machineset that are currently auto-preserved + // +optional + AutoPreserveFailedMachineCount int32 `json:"autoPreserveFailedMachineCount,omitempty"` } diff --git a/pkg/apis/machine/v1alpha1/shared_types.go b/pkg/apis/machine/v1alpha1/shared_types.go index 687151218..d9a7d0c7b 100644 --- a/pkg/apis/machine/v1alpha1/shared_types.go +++ b/pkg/apis/machine/v1alpha1/shared_types.go @@ -44,6 +44,10 @@ type MachineConfiguration struct { // +optional MachineInPlaceUpdateTimeout *metav1.Duration `json:"inPlaceUpdateTimeout,omitempty"` + // MachinePreserveTimeout is the timeout after which the machine preservation is stopped + // +optional + MachinePreserveTimeout *metav1.Duration `json:"machinePreserveTimeout,omitempty"` + // DisableHealthTimeout if set to true, health timeout will be ignored. Leading to machine never being declared failed. // This is intended to be used only for in-place updates. // +optional diff --git a/pkg/apis/machine/v1alpha1/zz_generated.conversion.go b/pkg/apis/machine/v1alpha1/zz_generated.conversion.go index 7d5d1b485..b636363da 100644 --- a/pkg/apis/machine/v1alpha1/zz_generated.conversion.go +++ b/pkg/apis/machine/v1alpha1/zz_generated.conversion.go @@ -13,8 +13,8 @@ import ( unsafe "unsafe" machine "github.com/gardener/machine-controller-manager/pkg/apis/machine" - v1 "k8s.io/api/core/v1" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + corev1 "k8s.io/api/core/v1" + v1 "k8s.io/apimachinery/pkg/apis/meta/v1" conversion "k8s.io/apimachinery/pkg/conversion" runtime "k8s.io/apimachinery/pkg/runtime" intstr "k8s.io/apimachinery/pkg/util/intstr" @@ -348,6 +348,7 @@ func autoConvert_v1alpha1_CurrentStatus_To_machine_CurrentStatus(in *CurrentStat out.Phase = machine.MachinePhase(in.Phase) out.TimeoutActive = in.TimeoutActive out.LastUpdateTime = in.LastUpdateTime + out.PreserveExpiryTime = (*v1.Time)(unsafe.Pointer(in.PreserveExpiryTime)) return nil } @@ -360,6 +361,7 @@ func autoConvert_machine_CurrentStatus_To_v1alpha1_CurrentStatus(in *machine.Cur out.Phase = MachinePhase(in.Phase) out.TimeoutActive = in.TimeoutActive out.LastUpdateTime = in.LastUpdateTime + out.PreserveExpiryTime = (*v1.Time)(unsafe.Pointer(in.PreserveExpiryTime)) return nil } @@ -457,10 +459,10 @@ func Convert_machine_Machine_To_v1alpha1_Machine(in *machine.Machine, out *Machi func autoConvert_v1alpha1_MachineClass_To_machine_MachineClass(in *MachineClass, out *machine.MachineClass, s conversion.Scope) error { out.ObjectMeta = in.ObjectMeta out.NodeTemplate = (*machine.NodeTemplate)(unsafe.Pointer(in.NodeTemplate)) - out.CredentialsSecretRef = (*v1.SecretReference)(unsafe.Pointer(in.CredentialsSecretRef)) + out.CredentialsSecretRef = (*corev1.SecretReference)(unsafe.Pointer(in.CredentialsSecretRef)) out.ProviderSpec = in.ProviderSpec out.Provider = in.Provider - out.SecretRef = (*v1.SecretReference)(unsafe.Pointer(in.SecretRef)) + out.SecretRef = (*corev1.SecretReference)(unsafe.Pointer(in.SecretRef)) return nil } @@ -472,10 +474,10 @@ func Convert_v1alpha1_MachineClass_To_machine_MachineClass(in *MachineClass, out func autoConvert_machine_MachineClass_To_v1alpha1_MachineClass(in *machine.MachineClass, out *MachineClass, s conversion.Scope) error { out.ObjectMeta = in.ObjectMeta out.NodeTemplate = (*NodeTemplate)(unsafe.Pointer(in.NodeTemplate)) - out.CredentialsSecretRef = (*v1.SecretReference)(unsafe.Pointer(in.CredentialsSecretRef)) + out.CredentialsSecretRef = (*corev1.SecretReference)(unsafe.Pointer(in.CredentialsSecretRef)) out.Provider = in.Provider out.ProviderSpec = in.ProviderSpec - out.SecretRef = (*v1.SecretReference)(unsafe.Pointer(in.SecretRef)) + out.SecretRef = (*corev1.SecretReference)(unsafe.Pointer(in.SecretRef)) return nil } @@ -527,10 +529,11 @@ func Convert_machine_MachineClassList_To_v1alpha1_MachineClassList(in *machine.M } func autoConvert_v1alpha1_MachineConfiguration_To_machine_MachineConfiguration(in *MachineConfiguration, out *machine.MachineConfiguration, s conversion.Scope) error { - out.MachineDrainTimeout = (*metav1.Duration)(unsafe.Pointer(in.MachineDrainTimeout)) - out.MachineHealthTimeout = (*metav1.Duration)(unsafe.Pointer(in.MachineHealthTimeout)) - out.MachineCreationTimeout = (*metav1.Duration)(unsafe.Pointer(in.MachineCreationTimeout)) - out.MachineInPlaceUpdateTimeout = (*metav1.Duration)(unsafe.Pointer(in.MachineInPlaceUpdateTimeout)) + out.MachineDrainTimeout = (*v1.Duration)(unsafe.Pointer(in.MachineDrainTimeout)) + out.MachineHealthTimeout = (*v1.Duration)(unsafe.Pointer(in.MachineHealthTimeout)) + out.MachineCreationTimeout = (*v1.Duration)(unsafe.Pointer(in.MachineCreationTimeout)) + out.MachineInPlaceUpdateTimeout = (*v1.Duration)(unsafe.Pointer(in.MachineInPlaceUpdateTimeout)) + out.MachinePreserveTimeout = (*v1.Duration)(unsafe.Pointer(in.MachinePreserveTimeout)) out.DisableHealthTimeout = (*bool)(unsafe.Pointer(in.DisableHealthTimeout)) out.MaxEvictRetries = (*int32)(unsafe.Pointer(in.MaxEvictRetries)) out.NodeConditions = (*string)(unsafe.Pointer(in.NodeConditions)) @@ -543,10 +546,11 @@ func Convert_v1alpha1_MachineConfiguration_To_machine_MachineConfiguration(in *M } func autoConvert_machine_MachineConfiguration_To_v1alpha1_MachineConfiguration(in *machine.MachineConfiguration, out *MachineConfiguration, s conversion.Scope) error { - out.MachineDrainTimeout = (*metav1.Duration)(unsafe.Pointer(in.MachineDrainTimeout)) - out.MachineHealthTimeout = (*metav1.Duration)(unsafe.Pointer(in.MachineHealthTimeout)) - out.MachineCreationTimeout = (*metav1.Duration)(unsafe.Pointer(in.MachineCreationTimeout)) - out.MachineInPlaceUpdateTimeout = (*metav1.Duration)(unsafe.Pointer(in.MachineInPlaceUpdateTimeout)) + out.MachineDrainTimeout = (*v1.Duration)(unsafe.Pointer(in.MachineDrainTimeout)) + out.MachineHealthTimeout = (*v1.Duration)(unsafe.Pointer(in.MachineHealthTimeout)) + out.MachineCreationTimeout = (*v1.Duration)(unsafe.Pointer(in.MachineCreationTimeout)) + out.MachineInPlaceUpdateTimeout = (*v1.Duration)(unsafe.Pointer(in.MachineInPlaceUpdateTimeout)) + out.MachinePreserveTimeout = (*v1.Duration)(unsafe.Pointer(in.MachinePreserveTimeout)) out.DisableHealthTimeout = (*bool)(unsafe.Pointer(in.DisableHealthTimeout)) out.MaxEvictRetries = (*int32)(unsafe.Pointer(in.MaxEvictRetries)) out.NodeConditions = (*string)(unsafe.Pointer(in.NodeConditions)) @@ -644,7 +648,7 @@ func Convert_machine_MachineDeploymentList_To_v1alpha1_MachineDeploymentList(in func autoConvert_v1alpha1_MachineDeploymentSpec_To_machine_MachineDeploymentSpec(in *MachineDeploymentSpec, out *machine.MachineDeploymentSpec, s conversion.Scope) error { out.Replicas = in.Replicas - out.Selector = (*metav1.LabelSelector)(unsafe.Pointer(in.Selector)) + out.Selector = (*v1.LabelSelector)(unsafe.Pointer(in.Selector)) if err := Convert_v1alpha1_MachineTemplateSpec_To_machine_MachineTemplateSpec(&in.Template, &out.Template, s); err != nil { return err } @@ -656,6 +660,7 @@ func autoConvert_v1alpha1_MachineDeploymentSpec_To_machine_MachineDeploymentSpec out.Paused = in.Paused out.RollbackTo = (*machine.RollbackConfig)(unsafe.Pointer(in.RollbackTo)) out.ProgressDeadlineSeconds = (*int32)(unsafe.Pointer(in.ProgressDeadlineSeconds)) + out.AutoPreserveFailedMachineMax = in.AutoPreserveFailedMachineMax return nil } @@ -666,7 +671,7 @@ func Convert_v1alpha1_MachineDeploymentSpec_To_machine_MachineDeploymentSpec(in func autoConvert_machine_MachineDeploymentSpec_To_v1alpha1_MachineDeploymentSpec(in *machine.MachineDeploymentSpec, out *MachineDeploymentSpec, s conversion.Scope) error { out.Replicas = in.Replicas - out.Selector = (*metav1.LabelSelector)(unsafe.Pointer(in.Selector)) + out.Selector = (*v1.LabelSelector)(unsafe.Pointer(in.Selector)) if err := Convert_machine_MachineTemplateSpec_To_v1alpha1_MachineTemplateSpec(&in.Template, &out.Template, s); err != nil { return err } @@ -678,6 +683,7 @@ func autoConvert_machine_MachineDeploymentSpec_To_v1alpha1_MachineDeploymentSpec out.Paused = in.Paused out.RollbackTo = (*RollbackConfig)(unsafe.Pointer(in.RollbackTo)) out.ProgressDeadlineSeconds = (*int32)(unsafe.Pointer(in.ProgressDeadlineSeconds)) + out.AutoPreserveFailedMachineMax = in.AutoPreserveFailedMachineMax return nil } @@ -852,7 +858,7 @@ func Convert_machine_MachineSetList_To_v1alpha1_MachineSetList(in *machine.Machi func autoConvert_v1alpha1_MachineSetSpec_To_machine_MachineSetSpec(in *MachineSetSpec, out *machine.MachineSetSpec, s conversion.Scope) error { out.Replicas = in.Replicas - out.Selector = (*metav1.LabelSelector)(unsafe.Pointer(in.Selector)) + out.Selector = (*v1.LabelSelector)(unsafe.Pointer(in.Selector)) if err := Convert_v1alpha1_ClassSpec_To_machine_ClassSpec(&in.MachineClass, &out.MachineClass, s); err != nil { return err } @@ -860,6 +866,7 @@ func autoConvert_v1alpha1_MachineSetSpec_To_machine_MachineSetSpec(in *MachineSe return err } out.MinReadySeconds = in.MinReadySeconds + out.AutoPreserveFailedMachineMax = in.AutoPreserveFailedMachineMax return nil } @@ -870,7 +877,7 @@ func Convert_v1alpha1_MachineSetSpec_To_machine_MachineSetSpec(in *MachineSetSpe func autoConvert_machine_MachineSetSpec_To_v1alpha1_MachineSetSpec(in *machine.MachineSetSpec, out *MachineSetSpec, s conversion.Scope) error { out.Replicas = in.Replicas - out.Selector = (*metav1.LabelSelector)(unsafe.Pointer(in.Selector)) + out.Selector = (*v1.LabelSelector)(unsafe.Pointer(in.Selector)) if err := Convert_machine_ClassSpec_To_v1alpha1_ClassSpec(&in.MachineClass, &out.MachineClass, s); err != nil { return err } @@ -878,6 +885,7 @@ func autoConvert_machine_MachineSetSpec_To_v1alpha1_MachineSetSpec(in *machine.M return err } out.MinReadySeconds = in.MinReadySeconds + out.AutoPreserveFailedMachineMax = in.AutoPreserveFailedMachineMax return nil } @@ -897,6 +905,7 @@ func autoConvert_v1alpha1_MachineSetStatus_To_machine_MachineSetStatus(in *Machi return err } out.FailedMachines = (*[]machine.MachineSummary)(unsafe.Pointer(in.FailedMachines)) + out.AutoPreserveFailedMachineCount = in.AutoPreserveFailedMachineCount return nil } @@ -916,6 +925,7 @@ func autoConvert_machine_MachineSetStatus_To_v1alpha1_MachineSetStatus(in *machi return err } out.FailedMachines = (*[]MachineSummary)(unsafe.Pointer(in.FailedMachines)) + out.AutoPreserveFailedMachineCount = in.AutoPreserveFailedMachineCount return nil } @@ -959,8 +969,8 @@ func Convert_machine_MachineSpec_To_v1alpha1_MachineSpec(in *machine.MachineSpec } func autoConvert_v1alpha1_MachineStatus_To_machine_MachineStatus(in *MachineStatus, out *machine.MachineStatus, s conversion.Scope) error { - out.Addresses = *(*[]v1.NodeAddress)(unsafe.Pointer(&in.Addresses)) - out.Conditions = *(*[]v1.NodeCondition)(unsafe.Pointer(&in.Conditions)) + out.Addresses = *(*[]corev1.NodeAddress)(unsafe.Pointer(&in.Addresses)) + out.Conditions = *(*[]corev1.NodeCondition)(unsafe.Pointer(&in.Conditions)) if err := Convert_v1alpha1_LastOperation_To_machine_LastOperation(&in.LastOperation, &out.LastOperation, s); err != nil { return err } @@ -977,8 +987,8 @@ func Convert_v1alpha1_MachineStatus_To_machine_MachineStatus(in *MachineStatus, } func autoConvert_machine_MachineStatus_To_v1alpha1_MachineStatus(in *machine.MachineStatus, out *MachineStatus, s conversion.Scope) error { - out.Addresses = *(*[]v1.NodeAddress)(unsafe.Pointer(&in.Addresses)) - out.Conditions = *(*[]v1.NodeCondition)(unsafe.Pointer(&in.Conditions)) + out.Addresses = *(*[]corev1.NodeAddress)(unsafe.Pointer(&in.Addresses)) + out.Conditions = *(*[]corev1.NodeCondition)(unsafe.Pointer(&in.Conditions)) if err := Convert_machine_LastOperation_To_v1alpha1_LastOperation(&in.LastOperation, &out.LastOperation, s); err != nil { return err } @@ -1051,8 +1061,8 @@ func Convert_machine_MachineTemplateSpec_To_v1alpha1_MachineTemplateSpec(in *mac } func autoConvert_v1alpha1_NodeTemplate_To_machine_NodeTemplate(in *NodeTemplate, out *machine.NodeTemplate, s conversion.Scope) error { - out.Capacity = *(*v1.ResourceList)(unsafe.Pointer(&in.Capacity)) - out.VirtualCapacity = *(*v1.ResourceList)(unsafe.Pointer(&in.VirtualCapacity)) + out.Capacity = *(*corev1.ResourceList)(unsafe.Pointer(&in.Capacity)) + out.VirtualCapacity = *(*corev1.ResourceList)(unsafe.Pointer(&in.VirtualCapacity)) out.InstanceType = in.InstanceType out.Region = in.Region out.Zone = in.Zone @@ -1066,8 +1076,8 @@ func Convert_v1alpha1_NodeTemplate_To_machine_NodeTemplate(in *NodeTemplate, out } func autoConvert_machine_NodeTemplate_To_v1alpha1_NodeTemplate(in *machine.NodeTemplate, out *NodeTemplate, s conversion.Scope) error { - out.Capacity = *(*v1.ResourceList)(unsafe.Pointer(&in.Capacity)) - out.VirtualCapacity = *(*v1.ResourceList)(unsafe.Pointer(&in.VirtualCapacity)) + out.Capacity = *(*corev1.ResourceList)(unsafe.Pointer(&in.Capacity)) + out.VirtualCapacity = *(*corev1.ResourceList)(unsafe.Pointer(&in.VirtualCapacity)) out.InstanceType = in.InstanceType out.Region = in.Region out.Zone = in.Zone diff --git a/pkg/apis/machine/v1alpha1/zz_generated.deepcopy.go b/pkg/apis/machine/v1alpha1/zz_generated.deepcopy.go index 2691557c9..788f6cbee 100644 --- a/pkg/apis/machine/v1alpha1/zz_generated.deepcopy.go +++ b/pkg/apis/machine/v1alpha1/zz_generated.deepcopy.go @@ -36,6 +36,10 @@ func (in *ClassSpec) DeepCopy() *ClassSpec { func (in *CurrentStatus) DeepCopyInto(out *CurrentStatus) { *out = *in in.LastUpdateTime.DeepCopyInto(&out.LastUpdateTime) + if in.PreserveExpiryTime != nil { + in, out := &in.PreserveExpiryTime, &out.PreserveExpiryTime + *out = (*in).DeepCopy() + } return } @@ -209,6 +213,11 @@ func (in *MachineConfiguration) DeepCopyInto(out *MachineConfiguration) { *out = new(metav1.Duration) **out = **in } + if in.MachinePreserveTimeout != nil { + in, out := &in.MachinePreserveTimeout, &out.MachinePreserveTimeout + *out = new(metav1.Duration) + **out = **in + } if in.DisableHealthTimeout != nil { in, out := &in.DisableHealthTimeout, &out.DisableHealthTimeout *out = new(bool) diff --git a/pkg/apis/machine/zz_generated.deepcopy.go b/pkg/apis/machine/zz_generated.deepcopy.go index 05c40b651..d22685750 100644 --- a/pkg/apis/machine/zz_generated.deepcopy.go +++ b/pkg/apis/machine/zz_generated.deepcopy.go @@ -36,6 +36,10 @@ func (in *ClassSpec) DeepCopy() *ClassSpec { func (in *CurrentStatus) DeepCopyInto(out *CurrentStatus) { *out = *in in.LastUpdateTime.DeepCopyInto(&out.LastUpdateTime) + if in.PreserveExpiryTime != nil { + in, out := &in.PreserveExpiryTime, &out.PreserveExpiryTime + *out = (*in).DeepCopy() + } return } @@ -209,6 +213,11 @@ func (in *MachineConfiguration) DeepCopyInto(out *MachineConfiguration) { *out = new(metav1.Duration) **out = **in } + if in.MachinePreserveTimeout != nil { + in, out := &in.MachinePreserveTimeout, &out.MachinePreserveTimeout + *out = new(metav1.Duration) + **out = **in + } if in.DisableHealthTimeout != nil { in, out := &in.DisableHealthTimeout, &out.DisableHealthTimeout *out = new(bool) diff --git a/pkg/controller/controller_utils.go b/pkg/controller/controller_utils.go index b0e8e8f54..acd4df5c2 100644 --- a/pkg/controller/controller_utils.go +++ b/pkg/controller/controller_utils.go @@ -732,6 +732,20 @@ func (s ActiveMachines) Len() int { return len(s) } func (s ActiveMachines) Swap(i, j int) { s[i], s[j] = s[j], s[i] } func (s ActiveMachines) Less(i, j int) bool { + // Preserved machines have higher priority than other machines and will be deleted last. + // So, we check if either of the machines is preserved first. + // If one is preserved and the other is not, the preserved one is "greater" + // If both are preserved or both are not preserved, we move to the next criteria. + now := metav1.Now() + isPreserved := func(m *v1alpha1.Machine) bool { + return m.Status.CurrentStatus.PreserveExpiryTime != nil && m.Status.CurrentStatus.PreserveExpiryTime.After(now.Time) + } + isPreservedI := isPreserved(s[i]) + isPreservedJ := isPreserved(s[j]) + if isPreservedI != isPreservedJ { + return !isPreservedI // if s[i] preserved, it is "greater" and should not be deleted first, therefore, "less" is false, and vice versa + } + // Default priority for machine objects machineIPriority := 3 machineJPriority := 3 diff --git a/pkg/controller/controller_utils_test.go b/pkg/controller/controller_utils_test.go index 83e40a656..746f4189b 100644 --- a/pkg/controller/controller_utils_test.go +++ b/pkg/controller/controller_utils_test.go @@ -8,6 +8,7 @@ import ( "context" "sort" "strconv" + "time" "github.com/gardener/machine-controller-manager/pkg/util/provider/machineutils" @@ -206,6 +207,78 @@ var _ = Describe("#controllerUtils", func() { sortedMachinesInOrderOfCreationTimeStamp[2].DeepCopy(), } + sortedPreservedAndUnpreservedMachines := []*machinev1.Machine{ + newMachine( + &machinev1.MachineTemplateSpec{ + ObjectMeta: *newObjectMeta(objMeta, 0), + Spec: machinev1.MachineSpec{ + Class: machinev1.ClassSpec{ + Kind: AWSMachineClass, + Name: TestMachineClass, + }, + }, + }, + &machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + }, + }, + nil, + nil, + nil, + ), + newMachine( + &machinev1.MachineTemplateSpec{ + ObjectMeta: *newObjectMeta(objMeta, 0), + Spec: machinev1.MachineSpec{ + Class: machinev1.ClassSpec{ + Kind: AWSMachineClass, + Name: TestMachineClass, + }, + }, + }, + &machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + PreserveExpiryTime: &metav1.Time{ + Time: time.Now().Add(10 * time.Minute), + }, + }, + }, + nil, + nil, + nil, + ), + newMachine( + &machinev1.MachineTemplateSpec{ + ObjectMeta: *newObjectMeta(objMeta, 0), + Spec: machinev1.MachineSpec{ + Class: machinev1.ClassSpec{ + Kind: AWSMachineClass, + Name: TestMachineClass, + }, + }, + }, + &machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineRunning, + PreserveExpiryTime: &metav1.Time{ + Time: time.Now().Add(10 * time.Minute), + }, + }, + }, + nil, + nil, + nil, + ), + } + + unsortedPreservedAndUnpreservedMachines := []*machinev1.Machine{ + sortedPreservedAndUnpreservedMachines[2].DeepCopy(), + sortedPreservedAndUnpreservedMachines[0].DeepCopy(), + sortedPreservedAndUnpreservedMachines[1].DeepCopy(), + } + DescribeTable("###sort", func(data *data) { sort.Sort(ActiveMachines(data.inputMachines)) @@ -224,6 +297,10 @@ var _ = Describe("#controllerUtils", func() { inputMachines: unsortedMachinesInOrderOfCreationTimeStamp, outputMachines: sortedMachinesInOrderOfCreationTimeStamp, }), + Entry("sort on preserved and unpreserved", &data{ + inputMachines: unsortedPreservedAndUnpreservedMachines, + outputMachines: sortedPreservedAndUnpreservedMachines, + }), ) }) diff --git a/pkg/controller/deployment_machineset_util.go b/pkg/controller/deployment_machineset_util.go index ee2250451..e0ca3f797 100644 --- a/pkg/controller/deployment_machineset_util.go +++ b/pkg/controller/deployment_machineset_util.go @@ -25,6 +25,7 @@ package controller import ( "context" "fmt" + "github.com/gardener/machine-controller-manager/pkg/util/provider/machineutils" "reflect" "k8s.io/klog/v2" @@ -48,7 +49,8 @@ func updateMachineSetStatus(ctx context.Context, machineClient machineapi.Machin is.Status.AvailableReplicas == newStatus.AvailableReplicas && is.Generation == is.Status.ObservedGeneration && reflect.DeepEqual(is.Status.Conditions, newStatus.Conditions) && - reflect.DeepEqual(is.Status.FailedMachines, newStatus.FailedMachines) { + reflect.DeepEqual(is.Status.FailedMachines, newStatus.FailedMachines) && + is.Status.AutoPreserveFailedMachineCount == newStatus.AutoPreserveFailedMachineCount { return is, nil } @@ -66,7 +68,8 @@ func updateMachineSetStatus(ctx context.Context, machineClient machineapi.Machin fmt.Sprintf("fullyLabeledReplicas %d->%d, ", is.Status.FullyLabeledReplicas, newStatus.FullyLabeledReplicas)+ fmt.Sprintf("readyReplicas %d->%d, ", is.Status.ReadyReplicas, newStatus.ReadyReplicas)+ fmt.Sprintf("availableReplicas %d->%d, ", is.Status.AvailableReplicas, newStatus.AvailableReplicas)+ - fmt.Sprintf("sequence No: %v->%v", is.Status.ObservedGeneration, newStatus.ObservedGeneration)) + fmt.Sprintf("sequence No: %v->%v,", is.Status.ObservedGeneration, newStatus.ObservedGeneration)+ + fmt.Sprintf("autoPreserveFailedMachineCount %v->%v", is.Status.AutoPreserveFailedMachineCount, newStatus.AutoPreserveFailedMachineCount)) is.Status = newStatus updatedIS, updateErr = c.UpdateStatus(ctx, is, metav1.UpdateOptions{}) @@ -78,7 +81,7 @@ func updateMachineSetStatus(ctx context.Context, machineClient machineapi.Machin if i >= statusUpdateRetries { break } - // Update the MachineSet with the latest resource veision for the next poll + // Update the MachineSet with the latest resource version for the next poll if is, getErr = c.Get(ctx, is.Name, metav1.GetOptions{}); getErr != nil { // If the GET fails we can't trust status.Replicas anymore. This error // is bound to be more interesting than the update failure. @@ -99,6 +102,7 @@ func calculateMachineSetStatus(is *v1alpha1.MachineSet, filteredMachines []*v1al fullyLabeledReplicasCount := 0 readyReplicasCount := 0 availableReplicasCount := 0 + autoPreserveFailedMachineCount := 0 failedMachines := []v1alpha1.MachineSummary{} var machineSummary v1alpha1.MachineSummary @@ -124,6 +128,10 @@ func calculateMachineSetStatus(is *v1alpha1.MachineSet, filteredMachines []*v1al } failedMachines = append(failedMachines, machineSummary) } + // Count number of failed machines annotated with PreserveMachineAnnotationValuePreservedByMCM + if machine.Annotations[machineutils.PreserveMachineAnnotationKey] == machineutils.PreserveMachineAnnotationValuePreservedByMCM { + autoPreserveFailedMachineCount++ + } } // Update the FailedMachines field when we see new failures @@ -153,6 +161,7 @@ func calculateMachineSetStatus(is *v1alpha1.MachineSet, filteredMachines []*v1al newStatus.ReadyReplicas = int32(readyReplicasCount) // #nosec G115 (CWE-190) -- number of machines will not exceed MaxInt32 newStatus.AvailableReplicas = int32(availableReplicasCount) // #nosec G115 (CWE-190) -- number of machines will not exceed MaxInt32 newStatus.LastOperation.LastUpdateTime = metav1.Now() + newStatus.AutoPreserveFailedMachineCount = int32(autoPreserveFailedMachineCount) // #nosec G115 (CWE-190) -- number of machines will not exceed MaxInt32 return newStatus } diff --git a/pkg/controller/deployment_rolling.go b/pkg/controller/deployment_rolling.go index 033be6c31..3a6a14e5d 100644 --- a/pkg/controller/deployment_rolling.go +++ b/pkg/controller/deployment_rolling.go @@ -480,10 +480,13 @@ func (dc *controller) removeAutoscalerAnnotationsIfRequired(ctx context.Context, klog.Warningf("Get annotations failed for node: %s, %s", machine.Labels[v1alpha1.NodeLabelKey], err) return err } - // Remove the autoscaler-related annotation only if the by-mcm annotation is already set. If - // by-mcm annotation is not set, the original annotation is likely be put by the end-user for their usecases. + // by-mcm annotation is not set, the original annotation is likely be put by the end-user for their use cases. if _, exists := nodeAnnotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMKey]; exists { + // do not remove the autoscaler related annotation if it is added due to ongoing machine preservation. + if !machine.Status.CurrentStatus.PreserveExpiryTime.IsZero() { + continue + } err = RemoveAnnotationsOffNode( ctx, dc.targetCoreClient, diff --git a/pkg/controller/deployment_sync.go b/pkg/controller/deployment_sync.go index b0f13d440..82481065a 100644 --- a/pkg/controller/deployment_sync.go +++ b/pkg/controller/deployment_sync.go @@ -250,12 +250,14 @@ func (dc *controller) getNewMachineSet(ctx context.Context, d *v1alpha1.MachineD // Set existing new machine set's annotation annotationsUpdated := SetNewMachineSetAnnotations(d, isCopy, newRevision, true) minReadySecondsNeedsUpdate := isCopy.Spec.MinReadySeconds != d.Spec.MinReadySeconds + autoPreserveFailedMachineMaxNeedsUpdate := isCopy.Spec.AutoPreserveFailedMachineMax != d.Spec.AutoPreserveFailedMachineMax nodeTemplateUpdated := SetNewMachineSetNodeTemplate(d, isCopy, newRevision, true) machineConfigUpdated := SetNewMachineSetConfig(d, isCopy, newRevision, true) updateMachineSetClassKind := UpdateMachineSetClassKind(d, isCopy, newRevision, true) - if annotationsUpdated || minReadySecondsNeedsUpdate || nodeTemplateUpdated || machineConfigUpdated || updateMachineSetClassKind { + if annotationsUpdated || minReadySecondsNeedsUpdate || nodeTemplateUpdated || machineConfigUpdated || updateMachineSetClassKind || autoPreserveFailedMachineMaxNeedsUpdate { isCopy.Spec.MinReadySeconds = d.Spec.MinReadySeconds + isCopy.Spec.AutoPreserveFailedMachineMax = d.Spec.AutoPreserveFailedMachineMax return dc.controlMachineClient.MachineSets(isCopy.Namespace).Update(ctx, isCopy, metav1.UpdateOptions{}) } @@ -311,10 +313,11 @@ func (dc *controller) getNewMachineSet(ctx context.Context, d *v1alpha1.MachineD Labels: newISTemplate.Labels, }, Spec: v1alpha1.MachineSetSpec{ - Replicas: 0, - MinReadySeconds: d.Spec.MinReadySeconds, - Selector: newISSelector, - Template: newISTemplate, + Replicas: 0, + MinReadySeconds: d.Spec.MinReadySeconds, + Selector: newISSelector, + Template: newISTemplate, + AutoPreserveFailedMachineMax: d.Spec.AutoPreserveFailedMachineMax, }, } allISs := append(oldISs, &newIS) diff --git a/pkg/controller/deployment_util.go b/pkg/controller/deployment_util.go index 690b4feb1..002fd2a7f 100644 --- a/pkg/controller/deployment_util.go +++ b/pkg/controller/deployment_util.go @@ -969,7 +969,7 @@ func LabelMachinesWithHash(ctx context.Context, machineList *v1alpha1.MachineLis } // Only label the machine that doesn't already have the new hash if machine.Labels[v1alpha1.DefaultMachineDeploymentUniqueLabelKey] != hash { - _, err := UpdateMachineWithRetries(ctx, c.Machines(machine.Namespace), machineLister, machine.Namespace, machine.Name, + _, err := machineutils.UpdateMachineWithRetries(ctx, c.Machines(machine.Namespace), machineLister, machine.Namespace, machine.Name, func(machineToUpdate *v1alpha1.Machine) error { // Precondition: the machine doesn't contain the new hash in its label. if machineToUpdate.Labels[v1alpha1.DefaultMachineDeploymentUniqueLabelKey] == hash { diff --git a/pkg/controller/machineset.go b/pkg/controller/machineset.go index acd41e80e..cf4abbf20 100644 --- a/pkg/controller/machineset.go +++ b/pkg/controller/machineset.go @@ -35,19 +35,15 @@ import ( apierrors "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/labels" - errorsutil "k8s.io/apimachinery/pkg/util/errors" utilruntime "k8s.io/apimachinery/pkg/util/runtime" "k8s.io/apimachinery/pkg/util/sets" "k8s.io/client-go/tools/cache" - "k8s.io/client-go/util/retry" "k8s.io/klog/v2" "k8s.io/utils/integer" "github.com/gardener/machine-controller-manager/pkg/apis/machine" "github.com/gardener/machine-controller-manager/pkg/apis/machine/v1alpha1" "github.com/gardener/machine-controller-manager/pkg/apis/machine/validation" - v1alpha1client "github.com/gardener/machine-controller-manager/pkg/client/clientset/versioned/typed/machine/v1alpha1" - v1alpha1listers "github.com/gardener/machine-controller-manager/pkg/client/listers/machine/v1alpha1" "github.com/gardener/machine-controller-manager/pkg/util/provider/machineutils" ) @@ -452,7 +448,9 @@ func (c *controller) manageReplicas(ctx context.Context, allMachines []*v1alpha1 var staleMachines []*v1alpha1.Machine for _, m := range machinesWithoutUpdateSuccessfulLabel { if machineutils.IsMachineFailed(m) { - staleMachines = append(staleMachines, m) + if c.shouldFailedMachineBeTerminated(m) { + staleMachines = append(staleMachines, m) + } } } @@ -554,6 +552,8 @@ func (c *controller) reconcileClusterMachineSet(key string) error { return err } + filteredMachines = c.manageAutoPreservationOfFailedMachines(ctx, filteredMachines, machineSet) + // TODO: Fix working of expectations to reflect correct behaviour // machineSetNeedsSync := c.expectations.SatisfiedExpectations(key) var manageReplicasErr error @@ -591,7 +591,7 @@ func (c *controller) reconcileClusterMachineSet(key string) error { // Multiple things could lead to this update failing. Requeuing the machine set ensures // Returning an error causes a requeue without forcing a hotloop if !apierrors.IsNotFound(err) { - klog.Errorf("Update machineSet %s failed with: %s", machineSet.Name, err) + klog.Errorf("update machineSet %s failed with: %s", machineSet.Name, err) } return err } @@ -672,8 +672,8 @@ func getMachinesToDelete(filteredMachines []*v1alpha1.Machine, diff int) []*v1al func getMachineKeys(machines []*v1alpha1.Machine) []string { machineKeys := make([]string, 0, len(machines)) - for _, machine := range machines { - machineKeys = append(machineKeys, MachineKey(machine)) + for _, mc := range machines { + machineKeys = append(machineKeys, MachineKey(mc)) } return machineKeys } @@ -736,8 +736,8 @@ func (c *controller) terminateMachines(ctx context.Context, inactiveMachines []* defer close(errCh) wg.Add(numOfInactiveMachines) - for _, machine := range inactiveMachines { - go c.prepareMachineForDeletion(ctx, machine, machineSet, &wg, errCh) + for _, m := range inactiveMachines { + go c.prepareMachineForDeletion(ctx, m, machineSet, &wg, errCh) } wg.Wait() @@ -832,7 +832,7 @@ func (c *controller) updateMachineStatus( clone, err = c.controlMachineClient.Machines(clone.Namespace).UpdateStatus(ctx, clone, metav1.UpdateOptions{}) if err != nil { // Keep retrying until update goes through - klog.V(3).Infof("Warning: Updated failed, retrying, error: %q", err) + klog.V(3).Infof("Warning: Update failed, retrying, error: %q", err) return c.updateMachineStatus(ctx, machine, lastOperation, currentStatus) } return clone, nil @@ -857,35 +857,141 @@ func isMachineStatusEqual(s1, s2 v1alpha1.MachineStatus) bool { return apiequality.Semantic.DeepEqual(s1Copy.LastOperation, s2Copy.LastOperation) && apiequality.Semantic.DeepEqual(s1Copy.CurrentStatus, s2Copy.CurrentStatus) } -// see https://github.com/kubernetes/kubernetes/issues/21479 -type updateMachineFunc func(machine *v1alpha1.Machine) error - -// UpdateMachineWithRetries updates a machine with given applyUpdate function. Note that machine not found error is ignored. -// The returned bool value can be used to tell if the machine is actually updated. -func UpdateMachineWithRetries(ctx context.Context, machineClient v1alpha1client.MachineInterface, machineLister v1alpha1listers.MachineLister, namespace, name string, applyUpdate updateMachineFunc) (*v1alpha1.Machine, error) { - var machine *v1alpha1.Machine +// shouldFailedMachineBeTerminated checks if the failed machine is already preserved, in the process of being preserved +// or if it is a candidate for auto-preservation. If none of these conditions are met, it returns true indicating +// that the failed machine should be terminated. +func (c *controller) shouldFailedMachineBeTerminated(machine *v1alpha1.Machine) bool { + // if preserve expiry time is set and is in the future, machine is already preserved + if machine.Status.CurrentStatus.PreserveExpiryTime != nil { + if machine.Status.CurrentStatus.PreserveExpiryTime.After(time.Now()) { + klog.V(3).Infof("Failed machine %q is preserved until %v", machine.Name, machine.Status.CurrentStatus.PreserveExpiryTime) + return false + } + klog.V(3).Infof("Preservation of failed machine %q has timed out at %v", machine.Name, machine.Status.CurrentStatus.PreserveExpiryTime) + return true + } + preserveValue, err := c.findEffectivePreserveValue(machine) + if err != nil { + // in case of error fetching node or annotations, we don't want to block deletion of failed machines, so we return true + klog.Errorf("error finding effective preserve value for machine %q: %v. Proceeding with termination of the machine.", machine.Name, err) + return true + } + switch preserveValue { + case machineutils.PreserveMachineAnnotationValueWhenFailed, machineutils.PreserveMachineAnnotationValueNow, machineutils.PreserveMachineAnnotationValuePreservedByMCM: // this is in case preservation process is not complete yet + return false + case machineutils.PreserveMachineAnnotationValueFalse: + return true + default: + return true + } +} - retryErr := retry.RetryOnConflict(retry.DefaultBackoff, func() error { - var err error - machine, err = machineLister.Machines(namespace).Get(name) +// manageAutoPreservationOfFailedMachines annotates failed machines with preserve=auto-preserved annotation +// to trigger preservation of the machines, by the machine controller, up to the limit defined in the +// MachineSet's AutoPreserveFailedMachineMax field. If the AutoPreserveFailedMachineMax limit is breached, it removes the preserve=auto-preserved annotation from the oldest annotated machines. +func (c *controller) manageAutoPreservationOfFailedMachines(ctx context.Context, machines []*v1alpha1.Machine, machineSet *v1alpha1.MachineSet) []*v1alpha1.Machine { + autoPreservationCapacityRemaining := machineSet.Spec.AutoPreserveFailedMachineMax - machineSet.Status.AutoPreserveFailedMachineCount + if autoPreservationCapacityRemaining == 0 { + // no capacity remaining, nothing to do + return machines + } else if autoPreservationCapacityRemaining < 0 { // when autoPreserveFailedMachineMax is decreased, it can be negative. + numStillExceeding := c.stopAutoPreservationForMachines(ctx, machines, int(-autoPreservationCapacityRemaining)) + if numStillExceeding > 0 { + klog.V(2).Infof("Attempted to decrease count of auto-preserved machines, but there are still %d violations of AutoPreserveFailedMachineMax.", numStillExceeding) + } + return machines + } + var autoPreservationCandidates []*v1alpha1.Machine + var others []*v1alpha1.Machine + for _, m := range machines { + // check if machine is already annotated for preservation, if yes, skip. Machine controller will take care of the rest. + if machineutils.IsMachineFailed(m) && !machineutils.PreventAutoPreserveAnnotationValues.Has(m.Annotations[machineutils.PreserveMachineAnnotationKey]) { + autoPreservationCandidates = append(autoPreservationCandidates, m) + } else { + others = append(others, m) + } + } + sort.Slice(autoPreservationCandidates, func(i, j int) bool { + return autoPreservationCandidates[i].CreationTimestamp.After(autoPreservationCandidates[j].CreationTimestamp.Time) + }) + for index, m := range autoPreservationCandidates { + if autoPreservationCapacityRemaining == 0 { + break + } + klog.V(2).Infof("Annotating failed machine %q for auto-preservation as part of machine set %q", m.Name, machineSet.Name) + updatedMachine, err := machineutils.UpdateMachineWithRetries(ctx, c.controlMachineClient.Machines(m.Namespace), c.machineLister, m.Namespace, m.Name, addAutoPreserveAnnotationOnMachine) if err != nil { - return err + klog.V(2).Infof("Error annotating machine %q for auto-preservation: %v", m.Name, err) + // since addAutoPreserveAnnotation uses retries internally, on error we can continue with other machines + continue } - machine = machine.DeepCopy() - // Apply the update, then attempt to push it to the apiserver. - if applyErr := applyUpdate(machine); applyErr != nil { - return applyErr + autoPreservationCandidates[index] = updatedMachine + autoPreservationCapacityRemaining-- + } + return append(autoPreservationCandidates, others...) +} + +func (c *controller) stopAutoPreservationForMachines(ctx context.Context, machines []*v1alpha1.Machine, numToStop int) int { + var autoPreservedMachines []*v1alpha1.Machine + for _, m := range machines { + if m.Annotations[machineutils.PreserveMachineAnnotationKey] == machineutils.PreserveMachineAnnotationValuePreservedByMCM { + autoPreservedMachines = append(autoPreservedMachines, m) } - machine, err = machineClient.Update(ctx, machine, metav1.UpdateOptions{}) - return err - }) + } + numOfAutoPreservedMachines := len(autoPreservedMachines) + if numOfAutoPreservedMachines == 0 { + return numToStop + } + if numOfAutoPreservedMachines > numToStop { + sort.Slice(autoPreservedMachines, func(i, j int) bool { + return autoPreservedMachines[i].CreationTimestamp.Before(&autoPreservedMachines[j].CreationTimestamp) + }) + } + for index, m := range autoPreservedMachines { + if numToStop == 0 { + break + } + klog.V(2).Infof("Removing auto-preservation annotation from machine %q as AutoPreserveFailedMachineMax is breached", m.Name) + updatedMachine, err := machineutils.UpdateMachineWithRetries(ctx, c.controlMachineClient.Machines(m.Namespace), c.machineLister, m.Namespace, m.Name, removeAutoPreserveAnnotationFromMachine) + if err != nil { + klog.Warningf("Error removing %q=%q annotation from machine %q: %v.", machineutils.PreserveMachineAnnotationKey, machineutils.PreserveMachineAnnotationValuePreservedByMCM, m.Name, err) + continue + } + autoPreservedMachines[index] = updatedMachine + numToStop-- + } + return numToStop +} - // Ignore the precondition violated error, this machine is already updated - // with the desired label. - if retryErr == errorsutil.ErrPreconditionViolated { - klog.V(4).Infof("Machine %s precondition doesn't hold, skip updating it.", name) - retryErr = nil +func addAutoPreserveAnnotationOnMachine(machineToUpdate *v1alpha1.Machine) error { + if machineToUpdate.Annotations == nil { + machineToUpdate.Annotations = make(map[string]string) } + machineToUpdate.Annotations[machineutils.PreserveMachineAnnotationKey] = machineutils.PreserveMachineAnnotationValuePreservedByMCM + return nil +} + +func removeAutoPreserveAnnotationFromMachine(machineToUpdate *v1alpha1.Machine) error { + delete(machineToUpdate.Annotations, machineutils.PreserveMachineAnnotationKey) + return nil +} - return machine, retryErr +func (c *controller) findEffectivePreserveValue(machine *v1alpha1.Machine) (string, error) { + var nodeAnnotationValue, machineAnnotationValue, lANodeAnnotationValue string + machineAnnotationValue = machine.Annotations[machineutils.PreserveMachineAnnotationKey] + lANodeAnnotationValue = machine.Annotations[machineutils.LastAppliedNodePreserveValueAnnotationKey] + nodeName := machine.Labels[v1alpha1.NodeLabelKey] + if nodeName != "" { + node, err := c.nodeLister.Get(nodeName) + if err != nil { + klog.Errorf("error fetching node %q for machine %q: %v", nodeName, machine.Name, err) + return "", err + } + nodeAnnotationValue = node.Annotations[machineutils.PreserveMachineAnnotationKey] + } + if nodeAnnotationValue == "" && lANodeAnnotationValue == "" { + return machineAnnotationValue, nil + } else { + return nodeAnnotationValue, nil + } } diff --git a/pkg/controller/machineset_test.go b/pkg/controller/machineset_test.go index 88d847017..7f807b352 100644 --- a/pkg/controller/machineset_test.go +++ b/pkg/controller/machineset_test.go @@ -8,6 +8,7 @@ import ( "context" "errors" "fmt" + corev1 "k8s.io/api/core/v1" "sync" "time" @@ -1376,18 +1377,34 @@ var _ = Describe("machineset", func() { }, } }) - - // Testcase: It should return the Failed machines first. It("should return the Failed machines first.", func() { - stop := make(chan struct{}) - defer close(stop) diff = 1 filteredMachines := []*machinev1.Machine{testActiveMachine1, testFailedMachine1} machinesToDelete := getMachinesToDelete(filteredMachines, diff) - Expect(len(machinesToDelete)).To(Equal(len(filteredMachines) - diff)) + Expect(len(machinesToDelete)).To(Equal(diff)) Expect(machinesToDelete[0].Name).To(Equal(testFailedMachine1.Name)) }) + It("should prioritise non-preserved machines for deletion.", func() { + diff = 2 + testPreservedFailedMachine := testFailedMachine1.DeepCopy() + testPreservedFailedMachine.Status.CurrentStatus.PreserveExpiryTime = &metav1.Time{Time: time.Now().Add(1 * time.Hour)} + filteredMachines := []*machinev1.Machine{testActiveMachine1, testFailedMachine1, testPreservedFailedMachine} + machinesToDelete := getMachinesToDelete(filteredMachines, diff) + Expect(len(machinesToDelete)).To(Equal(diff)) + // expect machinesToDelete to not contain testPreservedFailedMachine + Expect(machinesToDelete).ToNot(ContainElement(testPreservedFailedMachine)) + }) + It("should include preserved machine when needed to maintain replica count", func() { + diff = 2 + testPreservedFailedMachine := testFailedMachine1.DeepCopy() + testPreservedFailedMachine.Status.CurrentStatus.PreserveExpiryTime = &metav1.Time{Time: time.Now().Add(1 * time.Hour)} + filteredMachines := []*machinev1.Machine{testActiveMachine1, testPreservedFailedMachine} + machinesToDelete := getMachinesToDelete(filteredMachines, diff) + Expect(len(machinesToDelete)).To(Equal(diff)) + // expect machinesToDelete to contain testPreservedFailedMachine + Expect(machinesToDelete).To(ContainElement(testPreservedFailedMachine)) + }) }) Describe("#getMachineKeys", func() { @@ -1813,4 +1830,363 @@ var _ = Describe("machineset", func() { Expect(testMachineSet.Finalizers).To(Equal(finalizers)) }) }) + + Describe("#manageAutoPreservationOfFailedMachines", func() { + type setup struct { + autoPreserveFailedMachineCount int32 + autoPreserveFailedMachineMax int32 + additionalMachines []*machinev1.Machine + replicas int32 + } + type expect struct { + preservedMachineCount int + } + type testCase struct { + setup setup + expect expect + } + + DescribeTable("#manageAutoPreservationOfFailedMachines scenarios", func(tc testCase) { + stop := make(chan struct{}) + defer close(stop) + testMachineSet := &machinev1.MachineSet{ + ObjectMeta: metav1.ObjectMeta{ + Name: "MachineSet-test", + Namespace: testNamespace, + Labels: map[string]string{ + "test-label": "test-label", + }, + UID: "1234567", + }, + Spec: machinev1.MachineSetSpec{ + Replicas: tc.setup.replicas, + Template: machinev1.MachineTemplateSpec{ + ObjectMeta: metav1.ObjectMeta{ + Labels: map[string]string{ + "test-label": "test-label", + }, + }, + }, + Selector: &metav1.LabelSelector{ + MatchLabels: map[string]string{ + "test-label": "test-label", + }, + }, + AutoPreserveFailedMachineMax: tc.setup.autoPreserveFailedMachineMax, + }, + Status: machinev1.MachineSetStatus{ + AutoPreserveFailedMachineCount: tc.setup.autoPreserveFailedMachineCount, + }, + } + testMachine1 := &machinev1.Machine{ + ObjectMeta: metav1.ObjectMeta{ + Name: "machine-1", + Namespace: testNamespace, + CreationTimestamp: metav1.Time{Time: time.Now().Add(-2 * time.Hour)}, + }, + Status: machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: MachineFailed, + }, + }, + } + testMachine2 := &machinev1.Machine{ + ObjectMeta: metav1.ObjectMeta{ + Name: "machine-2", + Namespace: testNamespace, + CreationTimestamp: metav1.Time{Time: time.Now().Add(-1 * time.Hour)}, + }, + Status: machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: MachineFailed, + }, + }, + } + testMachine3 := &machinev1.Machine{ + ObjectMeta: metav1.ObjectMeta{ + Name: "machine-3", + Namespace: testNamespace, + CreationTimestamp: metav1.Time{Time: time.Now().Add(-2 * time.Hour)}, + }, + Status: machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: MachineRunning, + }, + }, + } + testMachine4 := &machinev1.Machine{ + ObjectMeta: metav1.ObjectMeta{ + Name: "machine-4", + Namespace: testNamespace, + CreationTimestamp: metav1.Time{Time: time.Now().Add(-2 * time.Hour)}, + Annotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: machineutils.PreserveMachineAnnotationValueFalse, + }, + }, + Status: machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: MachineFailed, + }, + }, + } + objects := []runtime.Object{testMachineSet, testMachine1, testMachine2, testMachine3, testMachine4} + for _, m := range tc.setup.additionalMachines { + objects = append(objects, m) + } + c, trackers := createController(stop, testNamespace, objects, nil, nil) + defer trackers.Stop() + waitForCacheSync(stop, c) + machinesList := []*machinev1.Machine{testMachine1, testMachine2, testMachine3, testMachine4} + machinesList = append(machinesList, tc.setup.additionalMachines...) + c.manageAutoPreservationOfFailedMachines(context.TODO(), machinesList, testMachineSet) + waitForCacheSync(stop, c) + updatedMachine1, _ := c.controlMachineClient.Machines(testNamespace).Get(context.TODO(), testMachine1.Name, metav1.GetOptions{}) + updatedMachine2, _ := c.controlMachineClient.Machines(testNamespace).Get(context.TODO(), testMachine2.Name, metav1.GetOptions{}) + updatedMachine3, _ := c.controlMachineClient.Machines(testNamespace).Get(context.TODO(), testMachine3.Name, metav1.GetOptions{}) + updatedMachine4, _ := c.controlMachineClient.Machines(testNamespace).Get(context.TODO(), testMachine4.Name, metav1.GetOptions{}) + preservedCount := 0 + if updatedMachine1.Annotations != nil && updatedMachine1.Annotations[machineutils.PreserveMachineAnnotationKey] == machineutils.PreserveMachineAnnotationValuePreservedByMCM { + preservedCount++ + } + if updatedMachine2.Annotations != nil && updatedMachine2.Annotations[machineutils.PreserveMachineAnnotationKey] == machineutils.PreserveMachineAnnotationValuePreservedByMCM { + preservedCount++ + } + // Running machine should not be auto-preserved in any of the cases + Expect(updatedMachine3.Annotations[machineutils.PreserveMachineAnnotationKey]).To(BeEmpty()) + // Machine with explicit preserve annotation set to false should not be auto-preserved + Expect(updatedMachine4.Annotations[machineutils.PreserveMachineAnnotationKey]).To(Equal(machineutils.PreserveMachineAnnotationValueFalse)) + + for _, m := range tc.setup.additionalMachines { + updatedMachine, _ := c.controlMachineClient.Machines(testNamespace).Get(context.TODO(), m.Name, metav1.GetOptions{}) + if updatedMachine.Annotations[machineutils.PreserveMachineAnnotationKey] == machineutils.PreserveMachineAnnotationValuePreservedByMCM { + preservedCount++ + } + } + Expect(preservedCount).To(Equal(tc.expect.preservedMachineCount)) + }, + Entry("should trigger auto preservation of 1 failed machine if AutoPreserveFailedMachineMax is 1 and AutoPreserveFailedMachineCount is 0", testCase{ + setup: setup{ + autoPreserveFailedMachineCount: 0, + autoPreserveFailedMachineMax: 1, + }, + expect: expect{ + preservedMachineCount: 1, + }, + }), + Entry("should not trigger auto preservation of failed machines if AutoPreserveFailedMachineMax is 0", testCase{ + setup: setup{ + autoPreserveFailedMachineCount: 0, + autoPreserveFailedMachineMax: 0, + }, + expect: expect{ + preservedMachineCount: 0, + }, + }), + Entry("should not trigger auto preservation of failed machines if AutoPreserveFailedMachineCount has reached AutoPreserveFailedMachineMax", testCase{ + setup: setup{ + autoPreserveFailedMachineCount: 2, + autoPreserveFailedMachineMax: 2, + }, + expect: expect{ + preservedMachineCount: 0, + }, + }), + Entry("should trigger auto preservation of both failed machines if AutoPreserveFailedMachineCount is 0 and AutoPreserveFailedMachineMax is 2", testCase{ + setup: setup{ + autoPreserveFailedMachineCount: 0, + autoPreserveFailedMachineMax: 2, + }, + expect: expect{ + preservedMachineCount: 2, + }, + }), + Entry("should not trigger auto preservation of failed machine annotated with preserve=false even if AutoPreserveFailedMachineCount < AutoPreserveFailedMachineMax", testCase{ + setup: setup{ + autoPreserveFailedMachineCount: 0, + autoPreserveFailedMachineMax: 3, + }, + expect: expect{ + preservedMachineCount: 2, + }, + }), + Entry("should stop auto preservation of machines annotated with preserve=auto-preserve if AutoPreserveFailedMachineCount > AutoPreserveFailedMachineMax", testCase{ + setup: setup{ + autoPreserveFailedMachineCount: 1, + autoPreserveFailedMachineMax: 0, + additionalMachines: []*machinev1.Machine{ + { + ObjectMeta: metav1.ObjectMeta{ + Name: "machine-5", + Namespace: testNamespace, + Annotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: machineutils.PreserveMachineAnnotationValuePreservedByMCM, + }, + }, + Status: machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: MachineFailed, + PreserveExpiryTime: &metav1.Time{Time: time.Now().Add(1 * time.Hour)}, + }, + }, + }, + }, + }, + expect: expect{ + preservedMachineCount: 0, + }, + }), + ) + }) + + Describe("#shouldFailedMachineBeTerminated", func() { + type setup struct { + preserveExpiryTime *metav1.Time + nodeName string + nodeAnnotationValue string + machineAnnotationValue string + laNodeAnnotationValue string + } + type expect struct { + result bool + } + type testCase struct { + setup setup + expect expect + } + + DescribeTable("shouldFailedMachineBeTerminated test cases", func(tc testCase) { + stop := make(chan struct{}) + defer close(stop) + + var controlMachineObjects []runtime.Object + var targetCoreObjects []runtime.Object + + machine := machinev1.Machine{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-machine", + Namespace: "default", + Annotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: tc.setup.machineAnnotationValue, + machineutils.LastAppliedNodePreserveValueAnnotationKey: tc.setup.laNodeAnnotationValue, + }, + Labels: map[string]string{ + machinev1.NodeLabelKey: tc.setup.nodeName, + }, + }, + Status: machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + PreserveExpiryTime: tc.setup.preserveExpiryTime, + }, + }, + } + controlMachineObjects = append(controlMachineObjects, &machine) + if tc.setup.nodeName != "" { + node := &corev1.Node{ + ObjectMeta: metav1.ObjectMeta{ + Name: tc.setup.nodeName, + Annotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: tc.setup.nodeAnnotationValue, + }, + }, + Status: corev1.NodeStatus{ + Conditions: []corev1.NodeCondition{}, + }, + } + targetCoreObjects = append(targetCoreObjects, node) + } + c, trackers := createController(stop, testNamespace, controlMachineObjects, nil, targetCoreObjects) + defer trackers.Stop() + waitForCacheSync(stop, c) + result := c.shouldFailedMachineBeTerminated(&machine) + + Expect(result).To(Equal(tc.expect.result)) + }, + Entry("should return false if preserve expiry time is in the future", testCase{ + setup: setup{ + preserveExpiryTime: &metav1.Time{Time: metav1.Now().Add(1 * time.Hour)}, + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeName: "test-node", + }, + expect: expect{ + result: false, + }, + }), + Entry("should return true if machine is annotated with preserve=false", testCase{ + setup: setup{ + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueFalse, + nodeName: "test-node", + }, + expect: expect{ + result: true, + }, + }), + Entry("should return true if node is annotated with preserve=false", testCase{ + setup: setup{ + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueFalse, + nodeName: "test-node", + }, + expect: expect{ + result: true, + }, + }), + Entry("should return false if machine is annotated with preserve=now, and node has not been annotated, and preserveExpiryTime is not yet set", testCase{ + setup: setup{ + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeName: "test-node", + }, + expect: expect{ + result: false, + }, + }), + Entry("should return false if node is annotated with preserve=now, and preserveExpiryTime is not yet set", testCase{ + setup: setup{ + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeName: "test-node", + }, + expect: expect{ + result: false, + }, + }), + Entry("should return false if machine is annotated with preserve=when-failed, and node has not been annotated", testCase{ + setup: setup{ + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueWhenFailed, + nodeName: "test-node", + }, + expect: expect{ + result: false, + }, + }), + Entry("should return false if node is annotated with preserve=when-failed", testCase{ + setup: setup{ + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueWhenFailed, + nodeName: "test-node", + }, + expect: expect{ + result: false, + }, + }), + Entry("should return true if preservation has timed out", testCase{ + setup: setup{ + preserveExpiryTime: &metav1.Time{Time: metav1.Now().Add(-1 * time.Second)}, + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeName: "test-node", + }, + expect: expect{ + result: true, + }, + }), + Entry("should return true if laNodePreserveValue is not empty, machineAnnotationValue is not empty and nodeAnnotationValue is empty, indicating that node Annotation Value was deleted", testCase{ + setup: setup{ + laNodeAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueWhenFailed, + nodeName: "test-node", + nodeAnnotationValue: "", + }, + expect: expect{ + result: true, + }, + }), + ) + }) }) diff --git a/pkg/controller/machineset_util.go b/pkg/controller/machineset_util.go index 78ce9d803..aa5e5b5cd 100644 --- a/pkg/controller/machineset_util.go +++ b/pkg/controller/machineset_util.go @@ -95,7 +95,7 @@ func (c *controller) syncMachinesNodeTemplates(ctx context.Context, machineList nodeTemplateChanged := copyMachineSetNodeTemplatesToMachines(machineSet, machine) // Only sync the machine that doesn't already have the latest nodeTemplate. if nodeTemplateChanged { - _, err := UpdateMachineWithRetries(ctx, controlClient.Machines(machine.Namespace), machineLister, machine.Namespace, machine.Name, + _, err := machineutils.UpdateMachineWithRetries(ctx, controlClient.Machines(machine.Namespace), machineLister, machine.Namespace, machine.Name, func(_ *v1alpha1.Machine) error { return nil }) @@ -118,7 +118,7 @@ func (c *controller) syncMachinesClassKind(ctx context.Context, machineList []*v classKindChanged := copyMachineSetClassKindToMachines(machineSet, machine) // Only sync the machine that doesn't already have the matching classKind. if classKindChanged { - _, err := UpdateMachineWithRetries(ctx, controlClient.Machines(machine.Namespace), machineLister, machine.Namespace, machine.Name, + _, err := machineutils.UpdateMachineWithRetries(ctx, controlClient.Machines(machine.Namespace), machineLister, machine.Namespace, machine.Name, func(_ *v1alpha1.Machine) error { return nil }) @@ -161,7 +161,7 @@ func (c *controller) syncMachinesConfig(ctx context.Context, machineList []*v1al configChanged := copyMachineSetConfigToMachines(machineSet, machine) // Only sync the machine that doesn't already have the latest config. if configChanged { - _, err := UpdateMachineWithRetries(ctx, controlClient.Machines(machine.Namespace), machineLister, machine.Namespace, machine.Name, + _, err := machineutils.UpdateMachineWithRetries(ctx, controlClient.Machines(machine.Namespace), machineLister, machine.Namespace, machine.Name, func(_ *v1alpha1.Machine) error { return nil }) diff --git a/pkg/openapi/openapi_generated.go b/pkg/openapi/openapi_generated.go index c4ee08c09..862a74f98 100644 --- a/pkg/openapi/openapi_generated.go +++ b/pkg/openapi/openapi_generated.go @@ -394,6 +394,12 @@ func schema_pkg_apis_machine_v1alpha1_CurrentStatus(ref common.ReferenceCallback Ref: ref("k8s.io/apimachinery/pkg/apis/meta/v1.Time"), }, }, + "preserveExpiryTime": { + SchemaProps: spec.SchemaProps{ + Description: "PreserveExpiryTime is the time at which MCM will stop preserving the machine", + Ref: ref("k8s.io/apimachinery/pkg/apis/meta/v1.Time"), + }, + }, }, }, }, @@ -682,6 +688,12 @@ func schema_pkg_apis_machine_v1alpha1_MachineConfiguration(ref common.ReferenceC Ref: ref("k8s.io/apimachinery/pkg/apis/meta/v1.Duration"), }, }, + "machinePreserveTimeout": { + SchemaProps: spec.SchemaProps{ + Description: "MachinePreserveTimeout is the timeout after which the machine preservation is stopped", + Ref: ref("k8s.io/apimachinery/pkg/apis/meta/v1.Duration"), + }, + }, "disableHealthTimeout": { SchemaProps: spec.SchemaProps{ Description: "DisableHealthTimeout if set to true, health timeout will be ignored. Leading to machine never being declared failed. This is intended to be used only for in-place updates.", @@ -943,6 +955,13 @@ func schema_pkg_apis_machine_v1alpha1_MachineDeploymentSpec(ref common.Reference Format: "int32", }, }, + "autoPreserveFailedMachineMax": { + SchemaProps: spec.SchemaProps{ + Description: "The maximum number of failed machines in the machine deployment that can be auto-preserved. In the gardener context, this number is derived from the AutoPreserveFailedMachineMax set at the worker level, distributed amongst the worker's machine deployments", + Type: []string{"integer"}, + Format: "int32", + }, + }, }, Required: []string{"template"}, }, @@ -1317,6 +1336,12 @@ func schema_pkg_apis_machine_v1alpha1_MachineSetSpec(ref common.ReferenceCallbac Format: "int32", }, }, + "autoPreserveFailedMachineMax": { + SchemaProps: spec.SchemaProps{ + Type: []string{"integer"}, + Format: "int32", + }, + }, }, }, }, @@ -1402,6 +1427,13 @@ func schema_pkg_apis_machine_v1alpha1_MachineSetStatus(ref common.ReferenceCallb }, }, }, + "autoPreserveFailedMachineCount": { + SchemaProps: spec.SchemaProps{ + Description: "AutoPreserveFailedMachineCount has a count of the number of failed machines in the machineset that are currently auto-preserved", + Type: []string{"integer"}, + Format: "int32", + }, + }, }, }, }, @@ -1462,6 +1494,12 @@ func schema_pkg_apis_machine_v1alpha1_MachineSpec(ref common.ReferenceCallback) Ref: ref("k8s.io/apimachinery/pkg/apis/meta/v1.Duration"), }, }, + "machinePreserveTimeout": { + SchemaProps: spec.SchemaProps{ + Description: "MachinePreserveTimeout is the timeout after which the machine preservation is stopped", + Ref: ref("k8s.io/apimachinery/pkg/apis/meta/v1.Duration"), + }, + }, "disableHealthTimeout": { SchemaProps: spec.SchemaProps{ Description: "DisableHealthTimeout if set to true, health timeout will be ignored. Leading to machine never being declared failed. This is intended to be used only for in-place updates.", diff --git a/pkg/util/nodeops/conditions.go b/pkg/util/nodeops/conditions.go index b02257d73..f3bafad77 100644 --- a/pkg/util/nodeops/conditions.go +++ b/pkg/util/nodeops/conditions.go @@ -66,9 +66,10 @@ func GetNodeCondition(ctx context.Context, c clientset.Interface, nodeName strin } // AddOrUpdateConditionsOnNode adds a condition to the node's status -func AddOrUpdateConditionsOnNode(ctx context.Context, c clientset.Interface, nodeName string, condition v1.NodeCondition) error { +func AddOrUpdateConditionsOnNode(ctx context.Context, c clientset.Interface, nodeName string, condition v1.NodeCondition) (*v1.Node, error) { firstTry := true - return clientretry.RetryOnConflict(Backoff, func() error { + var updatedNode *v1.Node + err := clientretry.RetryOnConflict(Backoff, func() error { var err error var oldNode *v1.Node // First we try getting node from the API server cache, as it's cheaper. If it fails @@ -87,20 +88,20 @@ func AddOrUpdateConditionsOnNode(ctx context.Context, c clientset.Interface, nod var newNode *v1.Node oldNodeCopy := oldNode newNode = AddOrUpdateCondition(oldNodeCopy, condition) - return UpdateNodeConditions(ctx, c, nodeName, oldNode, newNode) + updatedNode, err = UpdateNodeConditions(ctx, c, nodeName, oldNode, newNode) + return err }) + return updatedNode, err } // UpdateNodeConditions is for updating the node conditions from oldNode to the newNode -// using the nodes Update() method -func UpdateNodeConditions(ctx context.Context, c clientset.Interface, nodeName string, oldNode *v1.Node, newNode *v1.Node) error { +// using the node's UpdateStatus() method +func UpdateNodeConditions(ctx context.Context, c clientset.Interface, nodeName string, oldNode *v1.Node, newNode *v1.Node) (*v1.Node, error) { newNodeClone := oldNode.DeepCopy() newNodeClone.Status.Conditions = newNode.Status.Conditions - - _, err := c.CoreV1().Nodes().UpdateStatus(ctx, newNodeClone, metav1.UpdateOptions{}) + updatedNode, err := c.CoreV1().Nodes().UpdateStatus(ctx, newNodeClone, metav1.UpdateOptions{}) if err != nil { - return fmt.Errorf("failed to create update conditions for node %q: %v", nodeName, err) + return nil, fmt.Errorf("failed to create/update conditions on node %q: %v", nodeName, err) } - - return nil + return updatedNode, nil } diff --git a/pkg/util/provider/app/options/options.go b/pkg/util/provider/app/options/options.go index 9e66b4b27..7626749a0 100644 --- a/pkg/util/provider/app/options/options.go +++ b/pkg/util/provider/app/options/options.go @@ -78,6 +78,7 @@ func NewMCServer() *MCServer { MachineSafetyOrphanVMsPeriod: metav1.Duration{Duration: 15 * time.Minute}, MachineSafetyAPIServerStatusCheckPeriod: metav1.Duration{Duration: 1 * time.Minute}, MachineSafetyAPIServerStatusCheckTimeout: metav1.Duration{Duration: 30 * time.Second}, + MachinePreserveTimeout: metav1.Duration{Duration: 72 * time.Hour}, }, }, } @@ -110,6 +111,7 @@ func (s *MCServer) AddFlags(fs *pflag.FlagSet) { fs.DurationVar(&s.SafetyOptions.PvDetachTimeout.Duration, "machine-pv-detach-timeout", s.SafetyOptions.PvDetachTimeout.Duration, "Timeout (in duration) used while waiting for detach of PV while evicting/deleting pods") fs.DurationVar(&s.SafetyOptions.PvReattachTimeout.Duration, "machine-pv-reattach-timeout", s.SafetyOptions.PvReattachTimeout.Duration, "Timeout (in duration) used while waiting for reattach of PV onto a different node") fs.DurationVar(&s.SafetyOptions.MachineSafetyAPIServerStatusCheckTimeout.Duration, "machine-safety-apiserver-statuscheck-timeout", s.SafetyOptions.MachineSafetyAPIServerStatusCheckTimeout.Duration, "Timeout (in duration) for which the APIServer can be down before declare the machine controller frozen by safety controller") + fs.DurationVar(&s.SafetyOptions.MachinePreserveTimeout.Duration, "machine-preserve-timeout", s.SafetyOptions.MachinePreserveTimeout.Duration, "Duration for which a failed machine should be preserved if it has the appropriate preserve annotation set.") fs.DurationVar(&s.SafetyOptions.MachineSafetyOrphanVMsPeriod.Duration, "machine-safety-orphan-vms-period", s.SafetyOptions.MachineSafetyOrphanVMsPeriod.Duration, "Time period (in duration) used to poll for orphan VMs by safety controller.") fs.DurationVar(&s.SafetyOptions.MachineSafetyAPIServerStatusCheckPeriod.Duration, "machine-safety-apiserver-statuscheck-period", s.SafetyOptions.MachineSafetyAPIServerStatusCheckPeriod.Duration, "Time period (in duration) used to poll for APIServer's health by safety controller") @@ -187,9 +189,11 @@ func (s *MCServer) Validate() error { if s.SafetyOptions.MachineSafetyAPIServerStatusCheckPeriod.Duration < s.SafetyOptions.MachineSafetyAPIServerStatusCheckTimeout.Duration { errs = append(errs, fmt.Errorf("machine safety APIServer status check period should not be less than APIServer status check timeout")) } + if s.SafetyOptions.MachinePreserveTimeout.Duration < 0 { + errs = append(errs, fmt.Errorf("machine preserve timeout should be a non-negative number: got %v", s.SafetyOptions.MachinePreserveTimeout.Duration)) + } if s.ControlKubeconfig == "" && s.TargetKubeconfig == constants.TargetKubeconfigDisabledValue { errs = append(errs, fmt.Errorf("--control-kubeconfig cannot be empty if --target-kubeconfig=%s is specified", constants.TargetKubeconfigDisabledValue)) } - return utilerrors.NewAggregate(errs) } diff --git a/pkg/util/provider/machinecontroller/machine.go b/pkg/util/provider/machinecontroller/machine.go index 108f6f1d1..1a4b23494 100644 --- a/pkg/util/provider/machinecontroller/machine.go +++ b/pkg/util/provider/machinecontroller/machine.go @@ -58,9 +58,23 @@ func (c *controller) updateMachine(oldObj, newObj any) { klog.Errorf("couldn't convert to machine resource from object") return } + // to reconcile on change in annotations related to preservation + if oldMachine.Annotations[machineutils.PreserveMachineAnnotationKey] != newMachine.Annotations[machineutils.PreserveMachineAnnotationKey] { + c.enqueueMachine(newObj, "handling machine object preservation related UPDATE event") + return + } + // this check is required to enqueue a machine, annotated with the preservation key, whose phase has changed. + // if annotated with "when-failed", the machine should be preserved on Failure + // if annotated with "now", the machine should be drained on Failure + // if machine phase changes from Failed to Running, machine preservation should be stopped + _, exists := newMachine.Annotations[machineutils.PreserveMachineAnnotationKey] + if exists && oldMachine.Status.CurrentStatus.Phase != newMachine.Status.CurrentStatus.Phase { + c.enqueueMachine(newObj, "handling preserved machine phase update") + return + } if oldMachine.Generation == newMachine.Generation { - klog.V(3).Infof("Skipping non-spec updates for machine %s", oldMachine.Name) + klog.V(3).Infof("Skipping other non-spec updates for machine %q", oldMachine.Name) return } @@ -234,6 +248,11 @@ func (c *controller) reconcileClusterMachine(ctx context.Context, machine *v1alp } } + retry, err = c.manageMachinePreservation(ctx, machine) + if err != nil { + return retry, err + } + if machine.Spec.ProviderID == "" || machine.Status.CurrentStatus.Phase == "" || machine.Status.CurrentStatus.Phase == v1alpha1.MachineCrashLoopBackOff { return c.triggerCreationFlow( ctx, @@ -441,7 +460,6 @@ func (c *controller) triggerCreationFlow(ctx context.Context, createMachineReque uninitializedMachine = true klog.Infof("VM instance associated with machine %s was created but not initialized.", machine.Name) //clean me up. I'm dirty. - //TODO@thiyyakat add a pointer to a boolean variable indicating whether initialization has happened successfully. nodeName = getMachineStatusResponse.NodeName providerID = getMachineStatusResponse.ProviderID @@ -710,7 +728,6 @@ func (c *controller) shouldMachineBeMovedToTerminatingQueue(machine *v1alpha1.Ma if machine.DeletionTimestamp != nil && c.isCreationProcessing(machine) { klog.Warningf("Cannot delete machine %q, its deletionTimestamp is set but it is currently being processed by the creation flow\n", getMachineKey(machine)) } - return !c.isCreationProcessing(machine) && machine.DeletionTimestamp != nil } @@ -724,3 +741,103 @@ func (c *controller) isCreationProcessing(machine *v1alpha1.Machine) bool { _, isMachineInCreationFlow := c.pendingMachineCreationMap.Load(getMachineKey(machine)) return isMachineInCreationFlow } + +/* + SECTION + Machine Preservation operations +*/ + +// manageMachinePreservation manages machine preservation based on the preserve annotation values on the node and machine objects. +func (c *controller) manageMachinePreservation(ctx context.Context, machine *v1alpha1.Machine) (retry machineutils.RetryPeriod, err error) { + machineAnnotationsSynced := false + clone := machine.DeepCopy() + defer func() { + // This needs to be done for cases when machine is neither preserved nor un-preserved (e.g. when preserve changes from now to when-failed on a Failed machine), + // but the LastAppliedNodePreserveValueAnnotation needs to be synced to the machine object. + // We compare annotation value in the clone with the original machine object to see if an update is required + if err == nil && !machineAnnotationsSynced && clone.Annotations[machineutils.LastAppliedNodePreserveValueAnnotationKey] != machine.Annotations[machineutils.LastAppliedNodePreserveValueAnnotationKey] { + _, err = c.controlMachineClient.Machines(clone.Namespace).Update(ctx, clone, metav1.UpdateOptions{}) + if err != nil { + klog.Errorf("error updating LastAppliedNodePreserveValueAnnotation value on machine %q: %v", machine.Name, err) + } + } + if err != nil { + if apierrors.IsConflict(err) { + retry = machineutils.ConflictRetry + } else { + retry = machineutils.ShortRetry + } + } else { + retry = machineutils.LongRetry + } + }() + + nodeName := clone.Labels[v1alpha1.NodeLabelKey] + nodeAnnotationValue := "" + if nodeName != "" { + nodeAnnotationValue, err = c.getNodePreserveAnnotationValue(nodeName) + if err != nil { + return + } + } + var effectivePreserveValue string + effectivePreserveValue, clone.Annotations = getEffectivePreservationAnnotations(nodeAnnotationValue, clone.Annotations) + if !machineutils.AllowedPreserveAnnotationValues.Has(effectivePreserveValue) { + if effectivePreserveValue == nodeAnnotationValue { + klog.Warningf("Preserve annotation value %q on node %q is invalid", effectivePreserveValue, nodeName) + } else { + klog.Warningf("Preserve annotation value %q on machine %q is invalid", effectivePreserveValue, clone.Name) + } + return + } + switch effectivePreserveValue { + case "", machineutils.PreserveMachineAnnotationValueFalse: + machineAnnotationsSynced, err = c.stopPreservationIfActive(ctx, clone, false) + case machineutils.PreserveMachineAnnotationValueWhenFailed: + if !machineutils.IsMachineFailed(clone) || machineutils.IsMachinePreservationExpired(clone) { + machineAnnotationsSynced, err = c.stopPreservationIfActive(ctx, clone, false) + } else { + machineAnnotationsSynced, err = c.preserveMachine(ctx, clone, effectivePreserveValue) + } + case machineutils.PreserveMachineAnnotationValueNow: + if machineutils.IsMachinePreservationExpired(clone) { + // on timing out, remove preserve annotation to prevent incorrect re-preservation + machineAnnotationsSynced, err = c.stopPreservationIfActive(ctx, clone, true) + } else { + machineAnnotationsSynced, err = c.preserveMachine(ctx, clone, effectivePreserveValue) + } + case machineutils.PreserveMachineAnnotationValuePreservedByMCM: + if !machineutils.IsMachineFailed(clone) || machineutils.IsMachinePreservationExpired(clone) { + // To prevent incorrect re-preservation of a recovered, previously auto-preserved machine on future failures + // (since the autoPreserveFailedMachineCount maintained by the machineSetController, may have changed), + // in addition to stopping preservation, we also remove the preservation annotation on the machine. + machineAnnotationsSynced, err = c.stopPreservationIfActive(ctx, clone, true) + } else { + machineAnnotationsSynced, err = c.preserveMachine(ctx, clone, effectivePreserveValue) + } + } + if err != nil { + return + } + // This is to handle the case where a preserved machine recovers from Failed to Running + // in which case, pods should be allowed to be scheduled onto the node + if machineutils.IsMachineActive(clone) && nodeName != "" { + err = c.uncordonNodeIfCordoned(ctx, nodeName) + } + return +} + +// getEffectivePreservationAnnotations returns the effective preservation value, and updates the machine Annotations related to preservation +func getEffectivePreservationAnnotations(nodeAnnotationValue string, machineAnnotations map[string]string) (string, map[string]string) { + // If there is no active node annotation AND no previously-applied node annotation, + // enforce machine's preserve annotation. + // Otherwise, the node annotation takes precedence (even if now empty/removed). + if nodeAnnotationValue == "" && machineAnnotations[machineutils.LastAppliedNodePreserveValueAnnotationKey] == "" { + return machineAnnotations[machineutils.PreserveMachineAnnotationKey], machineAnnotations + } + clonedMachineAnnotations := make(map[string]string) + maps.Copy(clonedMachineAnnotations, machineAnnotations) + delete(clonedMachineAnnotations, machineutils.PreserveMachineAnnotationKey) + clonedMachineAnnotations[machineutils.LastAppliedNodePreserveValueAnnotationKey] = nodeAnnotationValue + return nodeAnnotationValue, clonedMachineAnnotations +} diff --git a/pkg/util/provider/machinecontroller/machine_safety.go b/pkg/util/provider/machinecontroller/machine_safety.go index b8ad153e2..3785f07c1 100644 --- a/pkg/util/provider/machinecontroller/machine_safety.go +++ b/pkg/util/provider/machinecontroller/machine_safety.go @@ -76,9 +76,10 @@ func (c *controller) reconcileClusterMachineSafetyAPIServer(_ string) error { } machine.Status.CurrentStatus = v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineRunning, - TimeoutActive: false, - LastUpdateTime: metav1.Now(), + Phase: v1alpha1.MachineRunning, + TimeoutActive: false, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: machine.Status.CurrentStatus.PreserveExpiryTime, } machine.Status.LastOperation = v1alpha1.LastOperation{ Description: "Machine Health Timeout was reset due to APIServer being unreachable", diff --git a/pkg/util/provider/machinecontroller/machine_test.go b/pkg/util/provider/machinecontroller/machine_test.go index 76214eac7..94ae90c59 100644 --- a/pkg/util/provider/machinecontroller/machine_test.go +++ b/pkg/util/provider/machinecontroller/machine_test.go @@ -7,6 +7,7 @@ package controller import ( "context" "fmt" + k8stesting "k8s.io/client-go/testing" "math" "time" @@ -16,7 +17,6 @@ import ( metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/runtime" "k8s.io/apimachinery/pkg/util/validation/field" - k8stesting "k8s.io/client-go/testing" machineapi "github.com/gardener/machine-controller-manager/pkg/apis/machine" "github.com/gardener/machine-controller-manager/pkg/apis/machine/v1alpha1" @@ -196,82 +196,6 @@ var _ = Describe("machine", func() { ) }) - /* - Describe("##updateMachineConditions", func() { - Describe("Update conditions of a non-existing machine", func() { - It("should return error", func() { - stop := make(chan struct{}) - defer close(stop) - - objects := []runtime.Object{} - c, trackers := createController(stop, testNamespace, objects, nil, nil) - defer trackers.Stop() - - testMachine := &v1alpha1.Machine{ - ObjectMeta: metav1.ObjectMeta{ - Name: "testmachine", - Namespace: testNamespace, - }, - Status: v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineTerminating, - }, - }, - } - conditions := []corev1.NodeCondition{} - var _, err = c.updateMachineConditions(testMachine, conditions) - Expect(err).Should(Not(BeNil())) - }) - }) - DescribeTable("Update conditions of an existing machine", - func(phase v1alpha1.MachinePhase, conditions []corev1.NodeCondition, expectedPhase v1alpha1.MachinePhase) { - stop := make(chan struct{}) - defer close(stop) - - testMachine := &v1alpha1.Machine{ - ObjectMeta: metav1.ObjectMeta{ - Name: "testmachine", - Namespace: testNamespace, - }, - Status: v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: phase, - }, - }, - } - objects := []runtime.Object{} - objects = append(objects, testMachine) - - c, trackers := createController(stop, testNamespace, objects, nil, nil) - defer trackers.Stop() - - var updatedMachine, err = c.updateMachineConditions(testMachine, conditions) - Expect(updatedMachine.Status.Conditions).Should(BeEquivalentTo(conditions)) - Expect(updatedMachine.Status.CurrentStatus.Phase).Should(BeIdenticalTo(expectedPhase)) - Expect(err).Should(BeNil()) - }, - Entry("healthy status but machine terminating", v1alpha1.MachineTerminating, []corev1.NodeCondition{ - { - Type: corev1.NodeReady, - Status: corev1.ConditionTrue, - }, - }, v1alpha1.MachineTerminating), - Entry("unhealthy status but machine running", v1alpha1.MachineRunning, []corev1.NodeCondition{ - { - Type: corev1.NodeReady, - Status: corev1.ConditionFalse, - }, - }, v1alpha1.MachineUnknown), - Entry("healthy status but machine not running", v1alpha1.MachineAvailable, []corev1.NodeCondition{ - { - Type: corev1.NodeReady, - Status: corev1.ConditionTrue, - }, - }, v1alpha1.MachineRunning), - ) - }) - */ - Describe("#ValidateMachine", func() { type data struct { action machineapi.Machine @@ -2782,7 +2706,7 @@ var _ = Describe("machine", func() { }, }, expect: expect{ - err: fmt.Errorf("failed to create update conditions for node \"fakeID-0\": Failed to update node"), + err: fmt.Errorf("failed to create/update conditions on node \"fakeID-0\": Failed to update node"), retry: machineutils.ShortRetry, machine: newMachine( &v1alpha1.MachineTemplateSpec{ @@ -2801,7 +2725,7 @@ var _ = Describe("machine", func() { LastUpdateTime: metav1.Now(), }, LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf("Drain failed due to failure in update of node conditions - %s. Will retry in next sync. %s", "failed to create update conditions for node \"fakeID-0\": Failed to update node", machineutils.InitiateDrain), + Description: fmt.Sprintf("Drain failed due to failure in update of node conditions - %s. Will retry in next sync. %s", "failed to create/update conditions on node \"fakeID-0\": Failed to update node", machineutils.InitiateDrain), State: v1alpha1.MachineStateFailed, Type: v1alpha1.MachineOperationDelete, LastUpdateTime: metav1.Now(), @@ -2996,7 +2920,7 @@ var _ = Describe("machine", func() { }, }, expect: expect{ - err: fmt.Errorf("failed to create update conditions for node \"fakeNode-0\": Failed to update node"), + err: fmt.Errorf("failed to create/update conditions on node \"fakeNode-0\": Failed to update node"), retry: machineutils.ShortRetry, machine: newMachine( &v1alpha1.MachineTemplateSpec{ @@ -3015,7 +2939,7 @@ var _ = Describe("machine", func() { LastUpdateTime: metav1.Now(), }, LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf("Drain failed due to failure in update of node conditions - %s. Will retry in next sync. %s", "failed to create update conditions for node \"fakeNode-0\": Failed to update node", machineutils.InitiateDrain), + Description: fmt.Sprintf("Drain failed due to failure in update of node conditions - %s. Will retry in next sync. %s", "failed to create/update conditions on node \"fakeNode-0\": Failed to update node", machineutils.InitiateDrain), State: v1alpha1.MachineStateFailed, Type: v1alpha1.MachineOperationDelete, LastUpdateTime: metav1.Now(), @@ -4074,470 +3998,486 @@ var _ = Describe("machine", func() { }), ) }) - /* - Describe("#checkMachineTimeout", func() { - type setup struct { - machines []*v1alpha1.Machine - } - type action struct { - machine string - } - type expect struct { - machine *v1alpha1.Machine - err bool - } - type data struct { - setup setup - action action - expect expect - } - objMeta := &metav1.ObjectMeta{ - GenerateName: "machine", - Namespace: "test", - } - machineName := "machine-0" - timeOutOccurred := -21 * time.Minute - timeOutNotOccurred := -5 * time.Minute - creationTimeOut := 20 * time.Minute - healthTimeOut := 10 * time.Minute - DescribeTable("##Machine Timeout Scenarios", - func(data *data) { - stop := make(chan struct{}) - defer close(stop) - machineObjects := []runtime.Object{} - for _, o := range data.setup.machines { - machineObjects = append(machineObjects, o) - } - coreObjects := []runtime.Object{} - controller, trackers := createController(stop, objMeta.Namespace, machineObjects, nil, coreObjects) - defer trackers.Stop() - waitForCacheSync(stop, controller) - action := data.action - machine, err := controller.controlMachineClient.Machines(objMeta.Namespace).Get(action.machine, metav1.GetOptions{}) - //Expect(err).ToNot(HaveOccurred()) - controller.checkMachineTimeout(machine) - actual, err := controller.controlMachineClient.Machines(machine.Namespace).Get(machine.Name, metav1.GetOptions{}) - Expect(err).To(BeNil()) - Expect(actual.Status.CurrentStatus.Phase).To(Equal(data.expect.machine.Status.CurrentStatus.Phase)) - Expect(actual.Status.CurrentStatus.//TimeoutActive).To(Equal(data.expect.machine.Status.CurrentStatus.//TimeoutActive)) - Expect(actual.Status.LastOperation.Description).To(Equal(data.expect.machine.Status.LastOperation.Description)) - Expect(actual.Status.LastOperation.State).To(Equal(data.expect.machine.Status.LastOperation.State)) - Expect(actual.Status.LastOperation.Type).To(Equal(data.expect.machine.Status.LastOperation.Type)) - }, - Entry("Machine is still running", &data{ - setup: setup{ - machines: newMachines(1, &v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineRunning, - //TimeoutActive: false, - LastUpdateTime: metav1.NewTime(time.Now().Add(timeOutNotOccurred)), - }, - LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf("Machine % successfully joined the cluster", machineName), - State: v1alpha1.MachineStateSuccessful, - Type: v1alpha1.MachineOperationCreate, - LastUpdateTime: metav1.NewTime(time.Now().Add(timeOutNotOccurred)), - }, - }, nil, nil, nil), - }, - action: action{ - machine: machineName, - }, - expect: expect{ - machine: newMachine(&v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineRunning, - }, - LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf("Machine % successfully joined the cluster", machineName), - State: v1alpha1.MachineStateSuccessful, - Type: v1alpha1.MachineOperationCreate, - }, - }, nil, nil, nil), - }, - }), - Entry("Machine creation has still not timed out", &data{ - setup: setup{ - machines: newMachines(1, &v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineUnknown, - LastUpdateTime: metav1.NewTime(time.Now().Add(timeOutNotOccurred)), - }, - LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf("Machine %s is unhealthy - changing MachineState to Unknown", machineName), - State: v1alpha1.MachineStateProcessing, - Type: v1alpha1.MachineOperationCreate, - LastUpdateTime: metav1.NewTime(time.Now().Add(timeOutNotOccurred)), - }, - }, nil, nil, nil), - }, - action: action{ - machine: machineName, - }, - expect: expect{ - machine: newMachine(&v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineUnknown, - }, - LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf("Machine %s is unhealthy - changing MachineState to Unknown", machineName), - State: v1alpha1.MachineStateProcessing, - Type: v1alpha1.MachineOperationCreate, - }, - }, nil, nil, nil), - }, - }), - Entry("Machine creation has timed out", &data{ - setup: setup{ - machines: newMachines(1, &v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachinePending, - LastUpdateTime: metav1.NewTime(time.Now().Add(timeOutOccurred)), - }, - LastOperation: v1alpha1.LastOperation{ - Description: "Creating machine on cloud provider", - State: v1alpha1.MachineStateProcessing, - Type: v1alpha1.MachineOperationCreate, - LastUpdateTime: metav1.NewTime(time.Now().Add(timeOutOccurred)), - }, - }, nil, nil, nil), - }, - action: action{ - machine: machineName, + Describe("#getEffectivePreservationAnnotations", func() { + type setup struct { + nodeAnnotationValue string + machineAnnotations map[string]string + } + type expect struct { + effectivePreserveValue string + machineAnnotations map[string]string + } + + type testCase struct { + setup setup + expect expect + } + + DescribeTable("getEffectivePreservationAnnotations scenarios", + func(tc testCase) { + + preserveValue, updatedMachineAnnotations := getEffectivePreservationAnnotations(tc.setup.nodeAnnotationValue, tc.setup.machineAnnotations) + Expect(preserveValue).To(Equal(tc.expect.effectivePreserveValue)) + Expect(updatedMachineAnnotations[machineutils.PreserveMachineAnnotationKey]).To(Equal(tc.expect.machineAnnotations[machineutils.PreserveMachineAnnotationKey])) + Expect(updatedMachineAnnotations[machineutils.LastAppliedNodePreserveValueAnnotationKey]).To(Equal(tc.expect.machineAnnotations[machineutils.LastAppliedNodePreserveValueAnnotationKey])) + }, + Entry("when node is not annotated and laNodeAnnotationValue is empty, should return machine's annotation value and empty string", testCase{ + setup: setup{ + nodeAnnotationValue: "", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "A", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "", }, - expect: expect{ - machine: newMachine(&v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineFailed, - }, - LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf( - "Machine %s failed to join the cluster in %s minutes.", - machineName, - creationTimeOut, - ), - State: v1alpha1.MachineStateFailed, - Type: v1alpha1.MachineOperationCreate, - }, - }, nil, nil, nil), + }, + expect: expect{ + effectivePreserveValue: "A", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "A", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "", }, - }), - Entry("Machine health has timed out", &data{ - setup: setup{ - machines: newMachines(1, &v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineUnknown, - LastUpdateTime: metav1.NewTime(time.Now().Add(timeOutOccurred)), - }, - LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf("Machine %s is unhealthy - changing MachineState to Unknown", machineName), - State: v1alpha1.MachineStateProcessing, - Type: v1alpha1.MachineOperationHealthCheck, - LastUpdateTime: metav1.NewTime(time.Now().Add(timeOutOccurred)), - }, - }, nil, nil, nil), + }, + }), + Entry("when neither node nor machine is not annotated and laNodeAnnotationValue is empty, should return two empty strings", testCase{ + setup: setup{ + nodeAnnotationValue: "", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "", }, - action: action{ - machine: machineName, + }, + expect: expect{ + effectivePreserveValue: "", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "", }, - expect: expect{ - machine: newMachine(&v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineFailed, - }, - LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf( - "Machine %s is not healthy since %s minutes. Changing status to failed. Node Conditions: %+v", - machineName, - healthTimeOut, - []corev1.NodeCondition{}, - ), - State: v1alpha1.MachineStateFailed, - Type: v1alpha1.MachineOperationHealthCheck, - }, - }, nil, nil, nil), + }, + }), + Entry("when neither node nor machine is annotated and laNodeAnnotationValue is \"A\", should return two empty strings", testCase{ + setup: setup{ + nodeAnnotationValue: "", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "A", }, - }), - ) - }) - Describe("#updateMachineState", func() { - type setup struct { - machines []*v1alpha1.Machine - nodes []*corev1.Node - } - type action struct { - machine string - } - type expect struct { - machine *v1alpha1.Machine - err bool - } - type data struct { - setup setup - action action - expect expect - } - objMeta := &metav1.ObjectMeta{ - GenerateName: "machine", - // using default namespace for non-namespaced objects - // as our current fake client is with the assumption - // that all objects are namespaced - Namespace: "", - } - machineName := "machine-0" - DescribeTable("##Different machine state update scenrios", - func(data *data) { - stop := make(chan struct{}) - defer close(stop) - machineObjects := []runtime.Object{} - for _, o := range data.setup.machines { - machineObjects = append(machineObjects, o) - } - coreObjects := []runtime.Object{} - for _, o := range data.setup.nodes { - coreObjects = append(coreObjects, o) - } - controller, trackers := createController(stop, objMeta.Namespace, machineObjects, nil, coreObjects) - defer trackers.Stop() - waitForCacheSync(stop, controller) - action := data.action - machine, err := controller.controlMachineClient.Machines(objMeta.Namespace).Get(action.machine, metav1.GetOptions{}) - Expect(err).ToNot(HaveOccurred()) - controller.updateMachineState(machine) - actual, err := controller.controlMachineClient.Machines(objMeta.Namespace).Get(action.machine, metav1.GetOptions{}) - Expect(err).To(BeNil()) - Expect(actual.Name).To(Equal(data.expect.machine.Name)) - Expect(actual.Status.Node).To(Equal(data.expect.machine.Status.Node)) - Expect(actual.Status.CurrentStatus.Phase).To(Equal(data.expect.machine.Status.CurrentStatus.Phase)) - Expect(actual.Status.CurrentStatus.//TimeoutActive).To(Equal(data.expect.machine.Status.CurrentStatus.//TimeoutActive)) - Expect(actual.Status.LastOperation.State).To(Equal(data.expect.machine.Status.LastOperation.State)) - Expect(actual.Status.LastOperation.Type).To(Equal(data.expect.machine.Status.LastOperation.Type)) - Expect(actual.Status.LastOperation.Description).To(Equal(data.expect.machine.Status.LastOperation.Description)) - if data.expect.machine.Labels != nil { - if _, ok := data.expect.machine.Labels["node"]; ok { - Expect(actual.Labels["node"]).To(Equal(data.expect.machine.Labels["node"])) - } - } - for i := range actual.Status.Conditions { - Expect(actual.Status.Conditions[i].Type).To(Equal(data.expect.machine.Status.Conditions[i].Type)) - Expect(actual.Status.Conditions[i].Status).To(Equal(data.expect.machine.Status.Conditions[i].Status)) - Expect(actual.Status.Conditions[i].Reason).To(Equal(data.expect.machine.Status.Conditions[i].Reason)) - Expect(actual.Status.Conditions[i].Message).To(Equal(data.expect.machine.Status.Conditions[i].Message)) - } }, - Entry("Machine does not have a node backing", &data{ - setup: setup{ - machines: newMachines(1, &v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{}, nil, nil, nil), + expect: expect{ + effectivePreserveValue: "", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "", }, - action: action{ - machine: machineName, + }, + }), + Entry("when node is annotated, laNodeAnnotationValue is empty, and machine is not annotated, should return node's annotation value as effective value and last applied value", testCase{ + setup: setup{ + nodeAnnotationValue: "A", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "", }, - expect: expect{ - machine: newMachine(&v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{}, nil, nil, nil), + }, + expect: expect{ + effectivePreserveValue: "A", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "A", }, - }), - Entry("Node object backing machine not found and machine conditions are empty", &data{ - setup: setup{ - machines: newMachines(1, &v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - Node: "dummy-node", - }, nil, nil, nil), + }, + }), + Entry("when node is annotated, laNodeAnnotationValue is empty, and machine is annotated differently, should return node's annotation value as effective value and last applied value", testCase{ + setup: setup{ + nodeAnnotationValue: "A", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "B", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "", }, - action: action{ - machine: machineName, + }, + expect: expect{ + effectivePreserveValue: "A", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "A", }, - expect: expect{ - machine: newMachine(&v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - Node: "dummy-node", - }, nil, nil, nil), + }, + }), + Entry("when node, machine annotation values and laNodeAnnotationValue are the same, should return node's annotation value as effective value and last applied value", testCase{ + setup: setup{ + nodeAnnotationValue: "A", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "A", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "A", }, - }), - Entry("Machine is running but node object is lost", &data{ - setup: setup{ - machines: newMachines(1, &v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - Node: "dummy-node", - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineRunning, - //TimeoutActive: false, - LastUpdateTime: metav1.Now(), - }, - LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf("Machine % successfully joined the cluster", machineName), - State: v1alpha1.MachineStateSuccessful, - Type: v1alpha1.MachineOperationCreate, - LastUpdateTime: metav1.Now(), - }, - Conditions: []corev1.NodeCondition{ - { - Message: "kubelet is posting ready status", - Reason: "KubeletReady", - Status: "True", - Type: "Ready", - }, - }, - }, nil, nil, nil), + }, + expect: expect{ + effectivePreserveValue: "A", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "A", }, - action: action{ - machine: machineName, + }, + }), + Entry("when node, machine annotation values are the same and laNodeAnnotationValue differs, should return node's annotation value as effective value and last applied value", testCase{ + setup: setup{ + nodeAnnotationValue: "A", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "A", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "B", }, - expect: expect{ - machine: newMachine(&v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - Node: "dummy-node", - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineUnknown, - LastUpdateTime: metav1.Now(), - }, - LastOperation: v1alpha1.LastOperation{ - Description: fmt.Sprintf( - "Node object went missing. Machine %s is unhealthy - changing MachineState to Unknown", - machineName, - ), - State: v1alpha1.MachineStateProcessing, - Type: v1alpha1.MachineOperationHealthCheck, - LastUpdateTime: metav1.Now(), - }, - Conditions: []corev1.NodeCondition{ - { - Message: "kubelet is posting ready status", - Reason: "KubeletReady", - Status: "True", - Type: "Ready", - }, - }, - }, nil, nil, nil), + }, + expect: expect{ + effectivePreserveValue: "A", + machineAnnotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: "", + machineutils.LastAppliedNodePreserveValueAnnotationKey: "A", }, - }), - Entry("Machine and node both are present and kubelet ready status is updated", &data{ - setup: setup{ - machines: newMachines(1, &v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - Node: "machine", - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachinePending, - LastUpdateTime: metav1.Now(), - }, - LastOperation: v1alpha1.LastOperation{ - Description: "Creating machine on cloud provider", - State: v1alpha1.MachineStateProcessing, - Type: v1alpha1.MachineOperationCreate, - LastUpdateTime: metav1.Now(), - }, - Conditions: []corev1.NodeCondition{ - { - Message: "kubelet is not ready", - Reason: "KubeletReady", - Status: "False", - Type: "Ready", - }, - }, - }, nil, nil, nil), - nodes: []*corev1.Node{ - { - ObjectMeta: *newObjectMeta(objMeta, 0), - Status: corev1.NodeStatus{ - Conditions: []corev1.NodeCondition{ - { - Message: "kubelet is posting ready status", - Reason: "KubeletReady", - Status: "True", - Type: "Ready", - }, - }, - }, - }, + }, + }), + ) + + }) + Describe("#manageMachinePreservation", func() { + type setup struct { + machineAnnotationValue string + nodeAnnotationValue string + laNodePreserveValue string + nodeName string + machinePhase v1alpha1.MachinePhase + preserveExpiryTime *metav1.Time + } + type expect struct { + retry machineutils.RetryPeriod + preserveExpiryTimeIsSet bool + nodeCondition *corev1.NodeCondition + machineAnnotationValue string + laNodePreserveValue string + err error + } + type testCase struct { + setup setup + expect expect + } + + DescribeTable("manageMachinePreservation behavior scenarios", + func(tc testCase) { + + stop := make(chan struct{}) + defer close(stop) + + var controlMachineObjects []runtime.Object + var targetCoreObjects []runtime.Object + + // Build machine + machine := &v1alpha1.Machine{ + ObjectMeta: metav1.ObjectMeta{ + Namespace: testNamespace, + Name: "m1", + Labels: map[string]string{ + v1alpha1.NodeLabelKey: tc.setup.nodeName, + }, + Annotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: tc.setup.machineAnnotationValue, + machineutils.LastAppliedNodePreserveValueAnnotationKey: tc.setup.laNodePreserveValue, + }, + }, Status: v1alpha1.MachineStatus{ + CurrentStatus: v1alpha1.CurrentStatus{ + Phase: tc.setup.machinePhase, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: tc.setup.preserveExpiryTime, }, }, - action: action{ - machine: machineName, - }, - expect: expect{ - machine: newMachine(&v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - Node: "machine", - CurrentStatus: v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineRunning, - //TimeoutActive: false, - LastUpdateTime: metav1.Now(), - }, - LastOperation: v1alpha1.LastOperation{ - Description: "Machine machine-0 successfully joined the cluster", - State: v1alpha1.MachineStateSuccessful, - Type: v1alpha1.MachineOperationCreate, - LastUpdateTime: metav1.Now(), - }, - Conditions: []corev1.NodeCondition{ - { - Message: "kubelet is posting ready status", - Reason: "KubeletReady", - Status: "True", - Type: "Ready", - }, - }, - }, nil, nil, nil), - }, - }), - Entry("Machine object does not have node-label and node exists", &data{ - setup: setup{ - machines: newMachines(1, &v1alpha1.MachineTemplateSpec{ - ObjectMeta: *newObjectMeta(objMeta, 0), - }, &v1alpha1.MachineStatus{ - Node: "node", - }, nil, nil, nil), - nodes: []*corev1.Node{ - { - ObjectMeta: metav1.ObjectMeta{ - Name: "node-0", - }, + } + + controlMachineObjects = append(controlMachineObjects, machine) + if tc.setup.nodeName != "" && tc.setup.nodeName != "invalid" { + node := &corev1.Node{ + ObjectMeta: metav1.ObjectMeta{ + Name: tc.setup.nodeName, + Annotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: tc.setup.nodeAnnotationValue, }, }, - }, - action: action{ - machine: machineName, - }, - expect: expect{ - machine: newMachine(&v1alpha1.MachineTemplateSpec{ - ObjectMeta: metav1.ObjectMeta{ - Name: "machine-0", - }, - }, &v1alpha1.MachineStatus{ - Node: "node", - }, nil, nil, - map[string]string{ - "node": "node-0", - }, - ), - }, - }), - ) - }) - */ + Status: corev1.NodeStatus{ + Conditions: []corev1.NodeCondition{}, + }, + } + targetCoreObjects = append(targetCoreObjects, node) + } + c, trackers := createController(stop, testNamespace, controlMachineObjects, nil, targetCoreObjects, nil, false) + defer trackers.Stop() + waitForCacheSync(stop, c) + retry, err := c.manageMachinePreservation(context.TODO(), machine) + + Expect(retry).To(Equal(tc.expect.retry)) + if tc.expect.err != nil { + Expect(err).To(HaveOccurred()) + Expect(err.Error()).To(Equal(tc.expect.err.Error())) + return + } + Expect(err).ToNot(HaveOccurred()) + waitForCacheSync(stop, c) + updatedMachine, err := c.controlMachineClient.Machines(testNamespace).Get(context.TODO(), machine.Name, metav1.GetOptions{}) + Expect(err).ToNot(HaveOccurred()) + if tc.expect.preserveExpiryTimeIsSet { + Expect(updatedMachine.Status.CurrentStatus.PreserveExpiryTime.IsZero()).To(BeFalse()) + } else { + Expect(updatedMachine.Status.CurrentStatus.PreserveExpiryTime.IsZero()).To(BeTrue()) + } + if tc.setup.nodeName != "" { + updatedNode, err := c.targetCoreClient.CoreV1().Nodes().Get(context.TODO(), tc.setup.nodeName, metav1.GetOptions{}) + Expect(err).ToNot(HaveOccurred()) + found := false + if tc.expect.nodeCondition != nil { + for _, cond := range updatedNode.Status.Conditions { + if cond.Type == tc.expect.nodeCondition.Type { + found = true + Expect(cond.Status).To(Equal(tc.expect.nodeCondition.Status)) + break + } + } + } + if tc.expect.nodeCondition != nil { + Expect(found).To(BeTrue()) + } else { + Expect(found).To(BeFalse()) + } + } + Expect(updatedMachine.Annotations[machineutils.PreserveMachineAnnotationKey]).To(Equal(tc.expect.machineAnnotationValue)) + Expect(updatedMachine.Annotations[machineutils.LastAppliedNodePreserveValueAnnotationKey]).To(Equal(tc.expect.laNodePreserveValue)) + }, + Entry("when no preserve annotation is set on machine and node, should return LongRetry", testCase{ + setup: setup{ + nodeName: "node-1", + }, + expect: expect{ + preserveExpiryTimeIsSet: false, + nodeCondition: nil, + retry: machineutils.LongRetry, + }, + }), + Entry("when preserve annotation 'now' is added to node of Running machine, should successfully start preservation", testCase{ + setup: setup{ + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + }, + expect: expect{ + laNodePreserveValue: machineutils.PreserveMachineAnnotationValueNow, + preserveExpiryTimeIsSet: true, + nodeCondition: &corev1.NodeCondition{ + Type: v1alpha1.NodePreserved, + Status: corev1.ConditionTrue}, + retry: machineutils.LongRetry, + }, + }), + Entry("when preserve annotation 'when-failed' added to node of Running machine, should not start preservation", testCase{ + setup: setup{ + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueWhenFailed, + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + }, + expect: expect{ + laNodePreserveValue: machineutils.PreserveMachineAnnotationValueWhenFailed, + preserveExpiryTimeIsSet: false, + nodeCondition: nil, + retry: machineutils.LongRetry, + }, + }), + Entry("when node of Failed machine is annotated with `when-failed`, should start preservation", testCase{ + setup: setup{ + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueWhenFailed, + nodeName: "node-1", + machinePhase: v1alpha1.MachineFailed, + }, + expect: expect{ + laNodePreserveValue: machineutils.PreserveMachineAnnotationValueWhenFailed, + preserveExpiryTimeIsSet: true, + nodeCondition: &corev1.NodeCondition{ + Type: v1alpha1.NodePreserved, + Status: corev1.ConditionTrue}, + retry: machineutils.LongRetry, + }, + }), + Entry("when node of preserved machine is annotated with preserve value 'false', should stop preservation", testCase{ + setup: setup{ + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueFalse, + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + preserveExpiryTime: &metav1.Time{Time: metav1.Now().Add(1 * time.Hour)}, + }, + expect: expect{ + laNodePreserveValue: machineutils.PreserveMachineAnnotationValueFalse, + preserveExpiryTimeIsSet: false, + nodeCondition: nil, + retry: machineutils.LongRetry, + }, + }), + Entry("when machine is annotated for auto-preservation by MCM, should start preservation", testCase{ + setup: setup{ + machineAnnotationValue: machineutils.PreserveMachineAnnotationValuePreservedByMCM, + nodeName: "node-1", + machinePhase: v1alpha1.MachineFailed, + }, + expect: expect{ + preserveExpiryTimeIsSet: true, + nodeCondition: &corev1.NodeCondition{ + Type: v1alpha1.NodePreserved, + Status: corev1.ConditionTrue}, + retry: machineutils.LongRetry, + machineAnnotationValue: machineutils.PreserveMachineAnnotationValuePreservedByMCM, + }, + }), + Entry("when node is annotated and preservation times out, should stop preservation", testCase{ + setup: setup{ + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + preserveExpiryTime: &metav1.Time{Time: metav1.Now().Add(-1 * time.Minute)}, + }, + expect: expect{ + preserveExpiryTimeIsSet: false, + nodeCondition: &corev1.NodeCondition{Type: v1alpha1.NodePreserved, Status: corev1.ConditionFalse}, + retry: machineutils.LongRetry, + }, + }), + Entry("when machine is annotated and preservation times out, should stop preservation", testCase{ + setup: setup{ + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + preserveExpiryTime: &metav1.Time{Time: metav1.Now().Add(-1 * time.Minute)}, + }, + expect: expect{ + preserveExpiryTimeIsSet: false, + nodeCondition: &corev1.NodeCondition{Type: v1alpha1.NodePreserved, Status: corev1.ConditionFalse}, + retry: machineutils.LongRetry, + }, + }), + Entry("when invalid preserve annotation is added on node of un-preserved machine, should do nothing ", testCase{ + setup: setup{ + nodeAnnotationValue: "invalidValue", + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + }, + expect: expect{ + laNodePreserveValue: "invalidValue", + preserveExpiryTimeIsSet: false, + nodeCondition: nil, + retry: machineutils.LongRetry, + err: nil, + }, + }), + Entry("when invalid preserve annotation is added on machine of un-preserved machine, and node is not annotated should do nothing ", testCase{ + setup: setup{ + machineAnnotationValue: "invalidValue", + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + }, + expect: expect{ + machineAnnotationValue: "invalidValue", + preserveExpiryTimeIsSet: false, + nodeCondition: nil, + retry: machineutils.LongRetry, + err: nil, + }, + }), + Entry("when a machine is annotated with preserve=now, but has no backing node, should start preservation", testCase{ + setup: setup{ + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeAnnotationValue: "", + nodeName: "", + machinePhase: v1alpha1.MachineUnknown, + }, + expect: expect{ + preserveExpiryTimeIsSet: true, + nodeCondition: nil, + retry: machineutils.LongRetry, + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + err: nil, + }, + }), + Entry("when preservation times out for a machine annotated with preserve=now, but has no backing node, should stop preservation", testCase{ + setup: setup{ + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeName: "", + machinePhase: v1alpha1.MachineUnknown, + preserveExpiryTime: &metav1.Time{Time: metav1.Now().Add(-1 * time.Minute)}, + }, + expect: expect{ + preserveExpiryTimeIsSet: false, + nodeCondition: nil, + retry: machineutils.LongRetry, + err: nil, + }, + }), + Entry("when a machine has a backing node, but node retrieval fails", testCase{ + setup: setup{ + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + nodeAnnotationValue: "", + nodeName: "invalid", + machinePhase: v1alpha1.MachineUnknown, + }, + expect: expect{ + preserveExpiryTimeIsSet: false, + nodeCondition: nil, + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + retry: machineutils.ShortRetry, + err: fmt.Errorf("node %q not found", "invalid"), + }, + }), + Entry("when auto-preserved machine moves to Running, should stop preservation and remove auto-preserve annotation", testCase{ + setup: setup{ + machineAnnotationValue: machineutils.PreserveMachineAnnotationValuePreservedByMCM, + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + preserveExpiryTime: &metav1.Time{Time: metav1.Now().Add(1 * time.Hour)}, + }, + expect: expect{ + preserveExpiryTimeIsSet: false, + nodeCondition: &corev1.NodeCondition{Type: v1alpha1.NodePreserved, Status: corev1.ConditionFalse}, + machineAnnotationValue: "", + retry: machineutils.LongRetry, + err: nil, + }, + }), + Entry("when node is annotated with 'now' and machine is annotated with 'when-failed', should start preservation and remove preserve annotation from machine", testCase{ + setup: setup{ + nodeAnnotationValue: machineutils.PreserveMachineAnnotationValueNow, + machineAnnotationValue: machineutils.PreserveMachineAnnotationValueWhenFailed, + laNodePreserveValue: "", + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + }, + expect: expect{ + laNodePreserveValue: machineutils.PreserveMachineAnnotationValueNow, + preserveExpiryTimeIsSet: true, + nodeCondition: &corev1.NodeCondition{ + Type: v1alpha1.NodePreserved, + Status: corev1.ConditionTrue}, + machineAnnotationValue: "", + retry: machineutils.LongRetry, + err: nil, + }, + }), + // case possible when MCM goes down and node annotation value is cleared and machine is annotated + Entry("when node and machine are found to be annotated with \"\", and 'now', respectively and last applied node perserve value is 'now', should stop preservation", testCase{ + setup: setup{ + nodeAnnotationValue: "", + laNodePreserveValue: "now", + machineAnnotationValue: "now", + nodeName: "node-1", + machinePhase: v1alpha1.MachineRunning, + preserveExpiryTime: &metav1.Time{Time: metav1.Now().Add(1 * time.Hour)}, + }, + expect: expect{ + preserveExpiryTimeIsSet: false, + nodeCondition: &corev1.NodeCondition{Type: v1alpha1.NodePreserved, Status: corev1.ConditionFalse}, + machineAnnotationValue: "", + laNodePreserveValue: "", + retry: machineutils.LongRetry, + err: nil, + }, + }), + ) + }) }) diff --git a/pkg/util/provider/machinecontroller/machine_util.go b/pkg/util/provider/machinecontroller/machine_util.go index 36fa6f414..969e85797 100644 --- a/pkg/util/provider/machinecontroller/machine_util.go +++ b/pkg/util/provider/machinecontroller/machine_util.go @@ -403,7 +403,7 @@ func (c *controller) inPlaceUpdate(ctx context.Context, machine *v1alpha1.Machin cond.Reason = v1alpha1.ReadyForUpdate cond.LastTransitionTime = metav1.Now() cond.Message = "Node is ready for in-place update" - if err := nodeops.AddOrUpdateConditionsOnNode(ctx, c.targetCoreClient, getNodeName(machine), *cond); err != nil { + if _, err := nodeops.AddOrUpdateConditionsOnNode(ctx, c.targetCoreClient, getNodeName(machine), *cond); err != nil { return machineutils.ShortRetry, err } // give machine time for update to get applied @@ -478,7 +478,7 @@ func (c *controller) updateMachineStatusAndNodeCondition(ctx context.Context, ma cond.Reason = v1alpha1.DrainSuccessful cond.Message = "Node draining successful" - if err := nodeops.AddOrUpdateConditionsOnNode(ctx, c.targetCoreClient, getNodeName(machine), *cond); err != nil { + if _, err := nodeops.AddOrUpdateConditionsOnNode(ctx, c.targetCoreClient, getNodeName(machine), *cond); err != nil { return machineutils.ShortRetry, err } @@ -941,8 +941,9 @@ func (c *controller) reconcileMachineHealth(ctx context.Context, machine *v1alph klog.Warning(description) clone.Status.CurrentStatus = v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineUnknown, - LastUpdateTime: metav1.Now(), + Phase: v1alpha1.MachineUnknown, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: machine.Status.CurrentStatus.PreserveExpiryTime, } clone.Status.LastOperation = v1alpha1.LastOperation{ Description: description, @@ -995,7 +996,8 @@ func (c *controller) reconcileMachineHealth(ctx context.Context, machine *v1alph clone.Status.CurrentStatus = v1alpha1.CurrentStatus{ Phase: v1alpha1.MachineRunning, // TimeoutActive: false, - LastUpdateTime: metav1.Now(), + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: machine.Status.CurrentStatus.PreserveExpiryTime, } cloneDirty = true } @@ -1009,7 +1011,8 @@ func (c *controller) reconcileMachineHealth(ctx context.Context, machine *v1alph clone.Status.CurrentStatus = v1alpha1.CurrentStatus{ Phase: v1alpha1.MachineUnknown, // TimeoutActive: true, - LastUpdateTime: metav1.Now(), + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: machine.Status.CurrentStatus.PreserveExpiryTime, } clone.Status.LastOperation = v1alpha1.LastOperation{ Description: description, @@ -1085,7 +1088,6 @@ func (c *controller) reconcileMachineHealth(ctx context.Context, machine *v1alph } if timeElapsed > timeOutDuration { // Machine health timeout occurred while joining or rejoining of machine - if !isMachinePending && !isMachineInPlaceUpdating && !disableHealthTimeout { // Timeout occurred due to machine being unhealthy for too long description = fmt.Sprintf( @@ -1094,7 +1096,6 @@ func (c *controller) reconcileMachineHealth(ctx context.Context, machine *v1alph timeOutDuration, machine.Status.Conditions, ) - machineDeployName := getMachineDeploymentName(machine) // creating lock for machineDeployment, if not allocated c.permitGiver.RegisterPermits(machineDeployName, 1) @@ -1138,8 +1139,9 @@ func (c *controller) reconcileMachineHealth(ctx context.Context, machine *v1alph LastUpdateTime: metav1.Now(), } clone.Status.CurrentStatus = v1alpha1.CurrentStatus{ - Phase: v1alpha1.MachineFailed, - LastUpdateTime: metav1.Now(), + Phase: v1alpha1.MachineFailed, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: machine.Status.CurrentStatus.PreserveExpiryTime, } cloneDirty = true } @@ -1164,10 +1166,8 @@ func (c *controller) reconcileMachineHealth(ctx context.Context, machine *v1alph // Return error to end the reconcile err = errSuccessfulPhaseUpdate } - return machineutils.ShortRetry, err } - return machineutils.LongRetry, nil } @@ -1293,7 +1293,8 @@ func (c *controller) setMachineTerminationStatus(ctx context.Context, deleteMach clone.Status.CurrentStatus = v1alpha1.CurrentStatus{ Phase: v1alpha1.MachineTerminating, // TimeoutActive: false, - LastUpdateTime: metav1.Now(), + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: nil, } _, err := c.controlMachineClient.Machines(clone.Namespace).UpdateStatus(ctx, clone, metav1.UpdateOptions{}) @@ -1654,7 +1655,7 @@ func (c *controller) drainNode(ctx context.Context, deleteMachineRequest *driver if forceDeleteMachine { klog.Warningf("Failed to update node conditions: %v. However, since it's a force deletion shall continue deletion of VM.", err) } else { - klog.Errorf("Drain failed due to failure in update of node conditions: %v", err) + klog.Errorf("drain failed due to failure in update of node conditions: %v", err) description = fmt.Sprintf("Drain failed due to failure in update of node conditions - %s. Will retry in next sync. %s", err.Error(), machineutils.InitiateDrain) state = v1alpha1.MachineStateFailed @@ -1713,7 +1714,7 @@ func (c *controller) drainNode(ctx context.Context, deleteMachineRequest *driver description = fmt.Sprintf("Drain failed due to - %s. However, since it's a force deletion shall continue deletion of VM. %s", err.Error(), machineutils.DelVolumesAttachments) state = v1alpha1.MachineStateProcessing } else { - klog.Warningf("Drain failed for machine %q , providerID %q ,backing node %q. \nBuf:%v \nErrBuf:%v \nErr-Message:%v", machine.Name, getProviderID(machine), getNodeName(machine), buf, errBuf, err) + klog.Errorf("drain failed for machine %q , providerID %q ,backing node %q. \nBuf:%v \nErrBuf:%v \nErr-Message:%v", machine.Name, getProviderID(machine), getNodeName(machine), buf, errBuf, err) description = fmt.Sprintf("Drain failed due to - %s. Will retry in next sync. %s", err.Error(), machineutils.InitiateDrain) state = v1alpha1.MachineStateFailed @@ -2034,7 +2035,7 @@ func (c *controller) getEffectiveHealthTimeout(machine *v1alpha1.Machine) *metav return effectiveHealthTimeout } -// getEffectiveHealthTimeout returns the creationTimeout set on the machine-object, otherwise returns the timeout set using the global-flag. +// getEffectiveCreationTimeout returns the creationTimeout set on the machine-object, otherwise returns the timeout set using the global-flag. func (c *controller) getEffectiveCreationTimeout(machine *v1alpha1.Machine) *metav1.Duration { var effectiveCreationTimeout *metav1.Duration if machine.Spec.MachineConfiguration != nil && machine.Spec.MachineConfiguration.MachineCreationTimeout != nil { @@ -2055,6 +2056,17 @@ func (c *controller) getEffectiveInPlaceUpdateTimeout(machine *v1alpha1.Machine) return effectiveDependenciesUpdateTimeout } +// getEffectiveMachinePreserveTimeout returns the MachinePreserveTimeout set on the machine-object, otherwise returns the timeout set using the global-flag. +func (c *controller) getEffectiveMachinePreserveTimeout(machine *v1alpha1.Machine) *metav1.Duration { + var effectivePreserveTimeout *metav1.Duration + if machine.Spec.MachineConfiguration != nil && machine.Spec.MachineConfiguration.MachinePreserveTimeout != nil { + effectivePreserveTimeout = machine.Spec.MachineConfiguration.MachinePreserveTimeout + } else { + effectivePreserveTimeout = &c.safetyOptions.MachinePreserveTimeout + } + return effectivePreserveTimeout +} + // getEffectiveNodeConditions returns the nodeConditions set on the machine-object, otherwise returns the conditions set using the global-flag. func (c *controller) getEffectiveNodeConditions(machine *v1alpha1.Machine) *string { var effectiveNodeConditions *string @@ -2098,7 +2110,7 @@ func (c *controller) UpdateNodeTerminationCondition(ctx context.Context, machine setTerminationReasonByPhase(machine.Status.CurrentStatus.Phase, &terminationCondition) } - err = nodeops.AddOrUpdateConditionsOnNode(ctx, c.targetCoreClient, nodeName, terminationCondition) + _, err = nodeops.AddOrUpdateConditionsOnNode(ctx, c.targetCoreClient, nodeName, terminationCondition) if apierrors.IsNotFound(err) { return nil } @@ -2118,7 +2130,8 @@ func (c *controller) updateMachineToFailedState(ctx context.Context, description clone.Status.CurrentStatus = v1alpha1.CurrentStatus{ Phase: v1alpha1.MachineFailed, // TimeoutActive: false, - LastUpdateTime: metav1.Now(), + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: machine.Status.CurrentStatus.PreserveExpiryTime, } _, err := c.controlMachineClient.Machines(clone.Namespace).UpdateStatus(ctx, clone, metav1.UpdateOptions{}) @@ -2155,7 +2168,11 @@ func (c *controller) canMarkMachineFailed(machineDeployName, machineName, namesp for _, machine := range machineList { if machine.Status.CurrentStatus.Phase != v1alpha1.MachineUnknown && machine.Status.CurrentStatus.Phase != v1alpha1.MachineRunning { - inProgress++ + // since Preserved Failed machines are not replaced immediately, + // they need not be considered towards inProgress + if machine.Status.CurrentStatus.PreserveExpiryTime == nil { + inProgress++ + } switch machine.Status.CurrentStatus.Phase { case v1alpha1.MachineTerminating: terminating++ @@ -2333,3 +2350,356 @@ func (c *controller) fetchMatchingNodeName(machineName string) (string, error) { } return "", fmt.Errorf("machine %q not found in node lister for machine %q", machineName, machineName) } + +/* +SECTION +Utility Functions for Machine Preservation +*/ + +// preserveMachine contains logic to start the preservation of a machine and node. +func (c *controller) preserveMachine(ctx context.Context, machine *v1alpha1.Machine, preserveValue string) (bool, error) { + var err error + machineObjectUpdated := false + nodeName := machine.Labels[v1alpha1.NodeLabelKey] + if machine.Status.CurrentStatus.PreserveExpiryTime == nil { + klog.V(4).Infof("Starting preservation flow for machine %q.", machine.Name) + // Step 1: Add preserveExpiryTime to machine status + machine, err = c.setPreserveExpiryTimeOnMachine(ctx, machine) + if err != nil { + return machineObjectUpdated, err + } + machineObjectUpdated = true + } + if nodeName == "" { + // Machine has no backing node, preservation is complete + klog.V(2).Infof("Machine %q without backing node is preserved successfully till %v.", machine.Name, machine.Status.CurrentStatus.PreserveExpiryTime) + return machineObjectUpdated, nil + } + // Machine has a backing node + node, err := c.nodeLister.Get(nodeName) + if err != nil { + klog.Errorf("error trying to get node %q of machine %q: %v. Retrying.", nodeName, machine.Name, err) + return machineObjectUpdated, err + } + existingNodePreservedCondition := nodeops.GetCondition(node, v1alpha1.NodePreserved) + // checks if preservation is already complete + if existingNodePreservedCondition != nil && existingNodePreservedCondition.Status == v1.ConditionTrue { + return machineObjectUpdated, nil + } + // Step 2: Add annotations to prevent scale down of node by CA + updatedNode, err := c.addCAScaleDownDisabledAnnotationOnNode(ctx, node) + if err != nil { + return machineObjectUpdated, err + } + var drainErr error + if shouldPreservedNodeBeDrained(existingNodePreservedCondition, machine.Status.CurrentStatus.Phase) { + // Step 3: If machine is in Failed Phase, drain the backing node + drainErr = c.drainPreservedNode(ctx, machine) + } + newCond, needsUpdate := computeNewNodePreservedCondition(machine.Status.CurrentStatus, preserveValue, drainErr, existingNodePreservedCondition) + if needsUpdate { + // Step 4: Update NodePreserved Condition on Node, with drain status + _, err = nodeops.AddOrUpdateConditionsOnNode(ctx, c.targetCoreClient, updatedNode.Name, *newCond) + if drainErr != nil { + klog.Errorf("error draining preserved node %q for machine %q : %v", nodeName, machine.Name, drainErr) + return machineObjectUpdated, drainErr + } + if err != nil { + klog.Errorf("error trying to update node preserved condition for node %q of machine %q : %v", nodeName, machine.Name, err) + return machineObjectUpdated, err + } + } + klog.V(2).Infof("Machine %q and backing node preserved successfully till %v.", machine.Name, machine.Status.CurrentStatus.PreserveExpiryTime) + return machineObjectUpdated, nil +} + +// stopPreservationIfActive stops the preservation of the machine and node, if preserved, and returns true if machine object has been updated +func (c *controller) stopPreservationIfActive(ctx context.Context, machine *v1alpha1.Machine, removePreservationAnnotations bool) (bool, error) { + // removal of preserveExpiryTime is the last step of stopping preservation + // therefore, if preserveExpiryTime is not set, machine is not preserved + nodeName := machine.Labels[v1alpha1.NodeLabelKey] + if machine.Status.CurrentStatus.PreserveExpiryTime == nil { + return false, nil + } + // if there is no backing node + if nodeName == "" { + // remove annotation from machine if needed + if removePreservationAnnotations { + var err error + machine, err = c.removePreserveAnnotationOnMachine(ctx, machine) + if err != nil { + return false, err + } + } + err := c.clearMachinePreserveExpiryTime(ctx, machine) + if err != nil { + return false, err + } + klog.V(2).Infof("Preservation of machine %q with no backing node has stopped.", machine.Name) + return true, nil + } + + // Machine has a backing node + node, err := c.nodeLister.Get(nodeName) + if err != nil { + // if node is not found and error is simply returned, then preservation will never be stopped on machine + // therefore, this error is handled specifically + if apierrors.IsNotFound(err) { + // Node not found, proceed to clear preserveExpiryTime on machine + klog.Warningf("Node %q of machine %q not found. Proceeding to clear PreserveExpiryTime on machine.", nodeName, machine.Name) + err := c.clearMachinePreserveExpiryTime(ctx, machine) + if err != nil { + return false, err + } + klog.V(2).Infof("Preservation of machine %q has stopped.", machine.Name) + return true, nil + } + klog.Errorf("error trying to get node %q of machine %q: %v. Retrying.", nodeName, machine.Name, err) + return false, err + } + // prepare NodeCondition to set preservation as stopped + preservedConditionFalse := v1.NodeCondition{ + Type: v1alpha1.NodePreserved, + Status: v1.ConditionFalse, + LastTransitionTime: metav1.Now(), + Reason: v1alpha1.PreservationStopped, + } + // Step 1: change node condition to reflect that preservation has stopped + updatedNode, err := nodeops.AddOrUpdateConditionsOnNode(ctx, c.targetCoreClient, node.Name, preservedConditionFalse) + if err != nil { + return false, err + } + // Step 2: remove annotations from node + err = c.removePreservationRelatedAnnotationsOnNode(ctx, updatedNode, removePreservationAnnotations) + if err != nil { + return false, err + } + // Step 3: remove annotation from machine if needed + if removePreservationAnnotations { + machine, err = c.removePreserveAnnotationOnMachine(ctx, machine) + if err != nil { + return false, err + } + } + // Step 4: update machine status to set preserve expiry time to nil + err = c.clearMachinePreserveExpiryTime(ctx, machine) + if err != nil { + return false, err + } + klog.V(2).Infof("Preservation of machine %q with backing node %q has stopped.", machine.Name, nodeName) + return true, nil +} + +// setPreserveExpiryTimeOnMachine sets the PreserveExpiryTime on the machine object's Status.CurrentStatus to now + preserve timeout +func (c *controller) setPreserveExpiryTimeOnMachine(ctx context.Context, machine *v1alpha1.Machine) (*v1alpha1.Machine, error) { + preservedCurrentStatus := v1alpha1.CurrentStatus{ + Phase: machine.Status.CurrentStatus.Phase, + TimeoutActive: machine.Status.CurrentStatus.TimeoutActive, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: &metav1.Time{Time: metav1.Now().Add(c.getEffectiveMachinePreserveTimeout(machine).Duration)}, + } + machine.Status.CurrentStatus = preservedCurrentStatus + updatedMachine, err := c.controlMachineClient.Machines(machine.Namespace).UpdateStatus(ctx, machine, metav1.UpdateOptions{}) + if err != nil { + klog.Errorf("error updating preserveExpiryTime on machine %q: %v", machine.Name, err) + return nil, err + } + klog.V(2).Infof("Machine %q preserved till %v.", machine.Name, preservedCurrentStatus.PreserveExpiryTime) + return updatedMachine, nil +} + +// computeNewNodePreservedCondition returns the NodeCondition with the values set according to the preserveValue and the stage of Preservation +func computeNewNodePreservedCondition(currentStatus v1alpha1.CurrentStatus, preserveValue string, drainErr error, existingNodeCondition *v1.NodeCondition) (*v1.NodeCondition, bool) { + const preserveExpiryMessageSuffix = "Machine preserved until" + var newNodePreservedCondition *v1.NodeCondition + var needsUpdate bool + if existingNodeCondition == nil { + newNodePreservedCondition = &v1.NodeCondition{ + Type: v1alpha1.NodePreserved, + Status: v1.ConditionFalse, + LastTransitionTime: metav1.Now(), + } + needsUpdate = true + } else { + newNodePreservedCondition = existingNodeCondition.DeepCopy() + } + machinePhase := currentStatus.Phase + if machinePhase == v1alpha1.MachineFailed { + if drainErr == nil { + if !strings.Contains(newNodePreservedCondition.Message, v1alpha1.PreservedNodeDrainSuccessful) { + newNodePreservedCondition.Message = fmt.Sprintf("%s %s %v.", v1alpha1.PreservedNodeDrainSuccessful, preserveExpiryMessageSuffix, currentStatus.PreserveExpiryTime) + newNodePreservedCondition.Status = v1.ConditionTrue + needsUpdate = true + } + } else if !strings.Contains(newNodePreservedCondition.Message, v1alpha1.PreservedNodeDrainUnsuccessful) { + newNodePreservedCondition.Message = fmt.Sprintf("%s %s %v.", v1alpha1.PreservedNodeDrainUnsuccessful, preserveExpiryMessageSuffix, currentStatus.PreserveExpiryTime) + newNodePreservedCondition.Status = v1.ConditionFalse + needsUpdate = true + } + } else if newNodePreservedCondition.Status != v1.ConditionTrue { + newNodePreservedCondition.Status = v1.ConditionTrue + newNodePreservedCondition.Message = fmt.Sprintf("%s %v.", preserveExpiryMessageSuffix, currentStatus.PreserveExpiryTime) + needsUpdate = true + } + if preserveValue == machineutils.PreserveMachineAnnotationValuePreservedByMCM { + newNodePreservedCondition.Reason = v1alpha1.PreservedByMCM + } else { + newNodePreservedCondition.Reason = v1alpha1.PreservedByUser + } + return newNodePreservedCondition, needsUpdate +} + +// shouldPreservedNodeBeDrained returns true if the machine's backing node must be drained, else false +func shouldPreservedNodeBeDrained(existingCondition *v1.NodeCondition, machinePhase v1alpha1.MachinePhase) bool { + if machinePhase == v1alpha1.MachineFailed { + if existingCondition == nil { + return true + } + return !strings.Contains(existingCondition.Message, v1alpha1.PreservedNodeDrainSuccessful) + } + return false +} + +// clearMachinePreserveExpiryTime clears the PreserveExpiryTime on the machine object's Status.CurrentStatus +func (c *controller) clearMachinePreserveExpiryTime(ctx context.Context, machine *v1alpha1.Machine) error { + if machine.Status.CurrentStatus.PreserveExpiryTime == nil { + return nil + } + machine.Status.CurrentStatus.PreserveExpiryTime = nil + machine.Status.CurrentStatus.LastUpdateTime = metav1.Now() + _, err := c.controlMachineClient.Machines(machine.Namespace).UpdateStatus(ctx, machine, metav1.UpdateOptions{}) + if err != nil { + klog.Errorf("machine/status UPDATE failed for machine %q. Retrying, error: %s", machine.Name, err) + return err + } + return nil +} + +func (c *controller) removePreserveAnnotationOnMachine(ctx context.Context, machine *v1alpha1.Machine) (*v1alpha1.Machine, error) { + if machine.Annotations == nil || (machine.Annotations[machineutils.PreserveMachineAnnotationKey] == "" && machine.Annotations[machineutils.LastAppliedNodePreserveValueAnnotationKey] == "") { + return machine, nil + } + clone := machine.DeepCopy() + delete(clone.Annotations, machineutils.PreserveMachineAnnotationKey) + delete(clone.Annotations, machineutils.LastAppliedNodePreserveValueAnnotationKey) + updatedClone, err := c.controlMachineClient.Machines(clone.Namespace).Update(ctx, clone, metav1.UpdateOptions{}) + if err != nil { + klog.Errorf("failed to delete preserve annotation on machine %q. error: %v", machine.Name, err) + return nil, err + } + return updatedClone, nil +} + +// drainPreservedNode attempts to drain the node backing a preserved machine +func (c *controller) drainPreservedNode(ctx context.Context, machine *v1alpha1.Machine) error { + var ( + // Declarations + err error + forceDeletePods bool + timeOutOccurred bool + description string + readOnlyFileSystemCondition, nodeReadyCondition v1.NodeCondition + + // Initialization + maxEvictRetries = int32(math.Min(float64(*c.getEffectiveMaxEvictRetries(machine)), c.getEffectiveDrainTimeout(machine).Seconds()/drain.PodEvictionRetryInterval.Seconds())) + pvDetachTimeOut = c.safetyOptions.PvDetachTimeout.Duration + pvReattachTimeOut = c.safetyOptions.PvReattachTimeout.Duration + timeOutDuration = c.getEffectiveDrainTimeout(machine).Duration + forceDrainLabelPresent = machine.Labels["force-drain"] == "True" + nodeName = machine.Labels[v1alpha1.NodeLabelKey] + nodeNotReadyDuration = 5 * time.Minute + ReadonlyFilesystem v1.NodeConditionType = "ReadonlyFilesystem" + ) + for _, condition := range machine.Status.Conditions { + if condition.Type == v1.NodeReady { + nodeReadyCondition = condition + } else if condition.Type == ReadonlyFilesystem { + readOnlyFileSystemCondition = condition + } + } + + // verify and log node object's existence + _, err = c.nodeLister.Get(nodeName) + if err == nil { + klog.V(3).Infof("(drainNode) For node %q, machine %q, nodeReadyCondition: %s, readOnlyFileSystemCondition: %s", nodeName, machine.Name, nodeReadyCondition, readOnlyFileSystemCondition) + } else if apierrors.IsNotFound(err) { + klog.Warningf("(drainNode) Node %q for machine %q doesn't exist, so drain will finish instantly", nodeName, machine.Name) + } + + if !isConditionEmpty(nodeReadyCondition) && (nodeReadyCondition.Status != v1.ConditionTrue) && (time.Since(nodeReadyCondition.LastTransitionTime.Time) > nodeNotReadyDuration) { + message := "Setting forceDeletePods to true for drain as machine is NotReady for over 5min" + forceDeletePods = true + printLogInitError(message, &err, &description, machine, true) + } else if !isConditionEmpty(readOnlyFileSystemCondition) && (readOnlyFileSystemCondition.Status != v1.ConditionFalse) && (time.Since(readOnlyFileSystemCondition.LastTransitionTime.Time) > nodeNotReadyDuration) { + message := "Setting forceDeletePods to true for drain as machine is in ReadonlyFilesystem for over 5min" + forceDeletePods = true + printLogInitError(message, &err, &description, machine, true) + } + + // TODO@thiyyakat: how to calculate timeout? In the case of preserve=now, PreserveExpiryTime will not coincide with time of failure in which case pods will get force + // drained. + // current solution: since we want to know when machine transitioned to Failed, the code uses LastUpdateTime. + timeOutOccurred = utiltime.HasTimeOutOccurred(machine.Status.CurrentStatus.LastUpdateTime, timeOutDuration) + if forceDrainLabelPresent || timeOutOccurred { + forceDeletePods = true + timeOutDuration = 1 * time.Minute + maxEvictRetries = 1 + klog.V(2).Infof( + "Force delete/drain has been triggerred for machine %q with providerID %q and backing node %q. Timeout Occurred:%t, force-drain label present:%t", + machine.Name, + getProviderID(machine), + getNodeName(machine), + timeOutOccurred, + forceDrainLabelPresent, + ) + } else { + klog.V(2).Infof( + "Drain has been triggerred for preserved machine %q with providerID %q and backing node %q with drain-timeout:%v & maxEvictRetries:%d", + machine.Name, + getProviderID(machine), + getNodeName(machine), + timeOutDuration, + maxEvictRetries, + ) + } + + buf := bytes.NewBuffer([]byte{}) + errBuf := bytes.NewBuffer([]byte{}) + + drainOptions := drain.NewDrainOptions( + c.targetCoreClient, + c.targetKubernetesVersion, + timeOutDuration, + maxEvictRetries, + pvDetachTimeOut, + pvReattachTimeOut, + nodeName, + -1, + forceDeletePods, + true, + true, + true, + buf, + errBuf, + c.driver, + c.pvcLister, + c.pvLister, + c.pdbLister, + c.nodeLister, + c.podLister, + c.volumeAttachmentHandler, + c.podSynced, + ) + klog.V(3).Infof("(drainNode) Invoking RunDrain, timeOutDuration: %s", timeOutDuration) + err = drainOptions.RunDrain(ctx) + if err != nil { + klog.Errorf("drain failed for machine %q , providerID %q ,backing node %q. \nBuf:%v \nErrBuf:%v \nErr-Message:%v", machine.Name, getProviderID(machine), getNodeName(machine), buf, errBuf, err) + return err + } + if forceDeletePods { + klog.V(3).Infof("Force drain successful for machine %q , providerID %q ,backing node %q.", machine.Name, getProviderID(machine), getNodeName(machine)) + } else { + klog.V(3).Infof("Drain successful for machine %q , providerID %q ,backing node %q.", machine.Name, getProviderID(machine), getNodeName(machine)) + } + return nil +} diff --git a/pkg/util/provider/machinecontroller/machine_util_test.go b/pkg/util/provider/machinecontroller/machine_util_test.go index 53e92a6ba..4c3051139 100644 --- a/pkg/util/provider/machinecontroller/machine_util_test.go +++ b/pkg/util/provider/machinecontroller/machine_util_test.go @@ -9,6 +9,7 @@ import ( "encoding/json" "errors" "fmt" + "github.com/gardener/machine-controller-manager/pkg/controller/autoscaler" "k8s.io/klog/v2" "time" @@ -3957,4 +3958,751 @@ var _ = Describe("machine_util", func() { }), ) }) + Describe("#preserveMachine", func() { + type setup struct { + machinePhase machinev1.MachinePhase + nodeName string + preserveValue string + isCAAnnotationPresent bool + preservedNodeCondition corev1.NodeCondition + } + type expect struct { + preserveNodeCondition corev1.NodeCondition + isPreserveExpiryTimeSet bool + isCAAnnotationPresent bool + err error + } + type testCase struct { + setup setup + expect expect + } + DescribeTable("##preserveMachine behaviour scenarios", + func(tc *testCase) { + stop := make(chan struct{}) + defer close(stop) + + var controlMachineObjects []runtime.Object + var targetCoreObjects []runtime.Object + + machine := &machinev1.Machine{ + ObjectMeta: metav1.ObjectMeta{ + Name: "machine-1", + Namespace: testNamespace, + Labels: map[string]string{ + machinev1.NodeLabelKey: tc.setup.nodeName, + }, + }, + Spec: machinev1.MachineSpec{}, + Status: machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: tc.setup.machinePhase, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: nil, + }, + }, + } + if tc.setup.nodeName != "" && tc.setup.nodeName != "err-backing-node" { + + node := corev1.Node{ + ObjectMeta: metav1.ObjectMeta{ + Name: tc.setup.nodeName, + Labels: map[string]string{}, + Annotations: map[string]string{}, + }, + Status: corev1.NodeStatus{ + Conditions: []corev1.NodeCondition{}, + }, + } + if tc.setup.isCAAnnotationPresent { + node.Annotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey] = "true" + } + targetCoreObjects = append(targetCoreObjects, &node) + } + controlMachineObjects = append(controlMachineObjects, machine) + + c, trackers := createController(stop, testNamespace, controlMachineObjects, nil, targetCoreObjects, nil, false) + defer trackers.Stop() + waitForCacheSync(stop, c) + _, err := c.preserveMachine(context.TODO(), machine, tc.setup.preserveValue) + if tc.expect.err == nil { + Expect(err).To(BeNil()) + } else { + Expect(err).To(HaveOccurred()) + Expect(err.Error()).To(Equal(tc.expect.err.Error())) + } + updatedMachine, getErr := c.controlMachineClient.Machines(testNamespace).Get(context.TODO(), machine.Name, metav1.GetOptions{}) + Expect(getErr).To(BeNil()) + if tc.expect.isPreserveExpiryTimeSet { + Expect(updatedMachine.Status.CurrentStatus.PreserveExpiryTime.IsZero()).To(BeFalse()) + } else { + Expect(updatedMachine.Status.CurrentStatus.PreserveExpiryTime.IsZero()).To(BeTrue()) + } + if tc.setup.nodeName == "" || tc.setup.nodeName == "err-backing-node" { + return + } + updatedNode, getErr := c.targetCoreClient.CoreV1().Nodes().Get(context.TODO(), tc.setup.nodeName, metav1.GetOptions{}) + Expect(getErr).To(BeNil()) + Expect(updatedNode.Annotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey]).To(Equal(autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationValue)) + if tc.expect.preserveNodeCondition.Type != "" { + updatedNodeCondition := nodeops.GetCondition(updatedNode, tc.expect.preserveNodeCondition.Type) + Expect(updatedNodeCondition.Status).To(Equal(tc.expect.preserveNodeCondition.Status)) + Expect(updatedNodeCondition.Reason).To(Equal(tc.expect.preserveNodeCondition.Reason)) + Expect(updatedNodeCondition.Message).To(ContainSubstring(tc.expect.preserveNodeCondition.Message)) + } + }, + Entry("when preserve=now and there is no backing node", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineUnknown, + nodeName: "", + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + }, + expect: expect{ + err: nil, + isPreserveExpiryTimeSet: true, + }, + }), + Entry("when preserve=now, the machine is Running, and there is a backing node", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineRunning, + nodeName: "node-1", + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + }, + expect: expect{ + err: nil, + isPreserveExpiryTimeSet: true, + isCAAnnotationPresent: true, + preserveNodeCondition: corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + }, + }, + }), + Entry("when preserve=now, the machine has Failed, and there is a backing node", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineFailed, + nodeName: "node-1", + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + }, + expect: expect{ + err: nil, + isPreserveExpiryTimeSet: true, + isCAAnnotationPresent: true, + preserveNodeCondition: corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + Message: machinev1.PreservedNodeDrainSuccessful, + }, + }, + }), + Entry("when preserve=now, the machine has Failed, and the preservation is incomplete after step 1 - adding preserveExpiryTime", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineFailed, + nodeName: "node-1", + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + }, + expect: expect{ + err: nil, + isPreserveExpiryTimeSet: true, + isCAAnnotationPresent: true, + preserveNodeCondition: corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + Message: machinev1.PreservedNodeDrainSuccessful, + }, + }, + }), + Entry("when preserve=now, the machine has Failed, and the preservation is incomplete at step 2 - adding CA annotations", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineFailed, + nodeName: "node-1", + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + isCAAnnotationPresent: true, + }, + expect: expect{ + err: nil, + isPreserveExpiryTimeSet: true, + isCAAnnotationPresent: true, + preserveNodeCondition: corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + Message: machinev1.PreservedNodeDrainSuccessful, + }, + }, + }), + Entry("when preserve=now, the machine has Failed, and the preservation is incomplete because of drain failure", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineFailed, + nodeName: "node-1", + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + isCAAnnotationPresent: true, + preservedNodeCondition: corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionFalse, + Reason: machinev1.PreservedByUser, + Message: machinev1.PreservedNodeDrainUnsuccessful, + }, + }, + expect: expect{ + err: nil, + isPreserveExpiryTimeSet: true, + isCAAnnotationPresent: true, + preserveNodeCondition: corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + Message: machinev1.PreservedNodeDrainSuccessful, + }, + }, + }), + Entry("when preserve=when-failed, the machine has Failed, and there is a backing node", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineFailed, + nodeName: "node-1", + preserveValue: machineutils.PreserveMachineAnnotationValueWhenFailed, + }, + expect: expect{ + err: nil, + isPreserveExpiryTimeSet: true, + isCAAnnotationPresent: true, + preserveNodeCondition: corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + Message: machinev1.PreservedNodeDrainSuccessful, + }, + }, + }), + Entry("when preserve=auto-preserved, the machine has Failed, and there is a backing node", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineFailed, + nodeName: "node-1", + preserveValue: machineutils.PreserveMachineAnnotationValuePreservedByMCM, + }, + expect: expect{ + err: nil, + isPreserveExpiryTimeSet: true, + isCAAnnotationPresent: true, + preserveNodeCondition: corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByMCM, + Message: machinev1.PreservedNodeDrainSuccessful, + }, + }, + }), + Entry("when preserve=now, the machine has Failed, and there is an error fetching backing node", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineFailed, + nodeName: "err-backing-node", + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + }, + expect: expect{ + err: fmt.Errorf("node \"err-backing-node\" not found"), + isPreserveExpiryTimeSet: true, + isCAAnnotationPresent: false, + }, + }, + ), + ) + }) + Describe("#stopPreservationIfActive", func() { + type setup struct { + nodeName string + removePreserveAnnotation bool + } + type expect struct { + err error + } + type testCase struct { + setup setup + expect expect + } + DescribeTable("##stopPreservationIfActive behaviour scenarios", + func(tc *testCase) { + stop := make(chan struct{}) + defer close(stop) + + var controlMachineObjects []runtime.Object + var targetCoreObjects []runtime.Object + + machine := &machinev1.Machine{ + ObjectMeta: metav1.ObjectMeta{ + Name: "machine-1", + Namespace: testNamespace, + Labels: map[string]string{}, + }, + Spec: machinev1.MachineSpec{}, + Status: machinev1.MachineStatus{ + CurrentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: &metav1.Time{Time: time.Now().Add(10 * time.Minute)}, + }, + }, + } + if tc.setup.nodeName != "" && tc.setup.nodeName != "err-backing-node" { + machine.Labels[machinev1.NodeLabelKey] = tc.setup.nodeName + node := &corev1.Node{ + ObjectMeta: metav1.ObjectMeta{ + Name: "node-1", + Annotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: machineutils.PreserveMachineAnnotationValueNow, + autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey: autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationValue, + }, + }, + Status: corev1.NodeStatus{ + Conditions: []corev1.NodeCondition{ + { + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + }, + }, + }, + } + targetCoreObjects = append(targetCoreObjects, node) + + } else { + machine.Labels[machinev1.NodeLabelKey] = "" + } + + controlMachineObjects = append(controlMachineObjects, machine) + + c, trackers := createController(stop, testNamespace, controlMachineObjects, nil, targetCoreObjects, nil, false) + defer trackers.Stop() + waitForCacheSync(stop, c) + _, err := c.stopPreservationIfActive(context.TODO(), machine, tc.setup.removePreserveAnnotation) + if tc.expect.err != nil { + Expect(err).To(HaveOccurred()) + Expect(err.Error()).To(Equal(tc.expect.err.Error())) + return + } + Expect(err).To(BeNil()) + updatedMachine, getErr := c.controlMachineClient.Machines(testNamespace).Get(context.TODO(), machine.Name, metav1.GetOptions{}) + Expect(getErr).To(BeNil()) + Expect(updatedMachine.Status.CurrentStatus.PreserveExpiryTime.IsZero()).To(BeTrue()) + + if machine.Labels[machinev1.NodeLabelKey] == "" || machine.Labels[machinev1.NodeLabelKey] == "no-backing-node" { + return + } + updatedNode, getErr := c.targetCoreClient.CoreV1().Nodes().Get(context.TODO(), tc.setup.nodeName, metav1.GetOptions{}) + Expect(getErr).To(BeNil()) + updatedNodeCondition := nodeops.GetCondition(updatedNode, machinev1.NodePreserved) + Expect(updatedNodeCondition).ToNot(BeNil()) + Expect(updatedNodeCondition.Status).To(Equal(corev1.ConditionFalse)) + Expect(updatedNodeCondition.Reason).To(Equal(machinev1.PreservationStopped)) + if tc.setup.removePreserveAnnotation { + Expect(updatedNode.Annotations).NotTo(HaveKey(machineutils.PreserveMachineAnnotationKey)) + } else { + Expect(updatedNode.Annotations).To(HaveKey(machineutils.PreserveMachineAnnotationKey)) + } + + }, + Entry("when stopping preservation on a preserved machine with backing node and preserve annotation needs to be removed", &testCase{ + setup: setup{ + nodeName: "node-1", + removePreserveAnnotation: true, + }, + expect: expect{ + err: nil, + }, + }), + Entry("when stopping preservation on a preserved machine with backing node and preserve annotation shouldn't be removed", &testCase{ + setup: setup{ + nodeName: "node-1", + removePreserveAnnotation: false, + }, + expect: expect{ + err: nil, + }, + }), + Entry("when stopping preservation on a preserved machine with no backing node", &testCase{ + setup: setup{ + nodeName: "", + }, + expect: expect{ + err: nil, + }, + }), + Entry("when stopping preservation on a preserved machine, and the backing node is not found", &testCase{ + setup: setup{ + nodeName: "no-backing-node", + }, + expect: expect{ + err: nil, + }, + }), + ) + }) + Describe("#computeNewNodePreservedCondition", func() { + preserveExpiryTime := &metav1.Time{Time: time.Now().Add(2 * time.Hour)} + type setup struct { + currentStatus machinev1.CurrentStatus + preserveValue string + drainErr error + existingNodeCondition *corev1.NodeCondition + } + type expect struct { + newNodeCondition *corev1.NodeCondition + needsUpdate bool + } + type testCase struct { + setup setup + expect expect + } + DescribeTable("##computeNewNodePreservedCondition behaviour scenarios", + func(tc *testCase) { + newNodeCondition, needsUpdate := computeNewNodePreservedCondition( + tc.setup.currentStatus, + tc.setup.preserveValue, + tc.setup.drainErr, + tc.setup.existingNodeCondition, + ) + if tc.expect.newNodeCondition == nil { + Expect(newNodeCondition).To(BeNil()) + } else { + Expect(newNodeCondition.Type).To(Equal(tc.expect.newNodeCondition.Type)) + Expect(newNodeCondition.Status).To(Equal(tc.expect.newNodeCondition.Status)) + Expect(newNodeCondition.Reason).To(Equal(tc.expect.newNodeCondition.Reason)) + Expect(newNodeCondition.Message).To(Equal(tc.expect.newNodeCondition.Message)) + } + Expect(needsUpdate).To(Equal(tc.expect.needsUpdate)) + }, + Entry("when preserve=now, machine is Running, no existing condition", &testCase{ + setup: setup{ + currentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineRunning, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: preserveExpiryTime, + }, + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + existingNodeCondition: nil, + }, + expect: expect{ + newNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + Message: fmt.Sprintf("Machine preserved until %v.", preserveExpiryTime), + }, + needsUpdate: true, + }, + }), + Entry("when preserve=now, machine is Failed, drain successful, no existing condition", &testCase{ + setup: setup{ + currentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: preserveExpiryTime, + }, + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + drainErr: nil, + existingNodeCondition: nil, + }, + expect: expect{ + newNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + Message: fmt.Sprintf("%s Machine preserved until %v.", machinev1.PreservedNodeDrainSuccessful, preserveExpiryTime), + }, + needsUpdate: true, + }, + }), + Entry("when preserve=now, machine is Failed, drain is unsuccessful, no existing condition", &testCase{ + setup: setup{ + currentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: preserveExpiryTime, + }, + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + drainErr: fmt.Errorf("test drain error"), + existingNodeCondition: nil, + }, + expect: expect{ + newNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionFalse, + Reason: machinev1.PreservedByUser, + Message: fmt.Sprintf("%s Machine preserved until %v.", machinev1.PreservedNodeDrainUnsuccessful, preserveExpiryTime), + }, + needsUpdate: true, + }, + }), + Entry("when machine auto-preserved by MCM, machine is Failed, drain is successful, no existing condition", &testCase{ + setup: setup{ + currentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: preserveExpiryTime, + }, + preserveValue: machineutils.PreserveMachineAnnotationValuePreservedByMCM, + drainErr: nil, + existingNodeCondition: nil, + }, + expect: expect{ + newNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByMCM, + Message: fmt.Sprintf("%s Machine preserved until %v.", machinev1.PreservedNodeDrainSuccessful, preserveExpiryTime), + }, + needsUpdate: true, + }, + }), + Entry("when preserve=now, machine is Failed, drain is unsuccessful, existing condition present", &testCase{ + setup: setup{ + currentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: preserveExpiryTime, + }, + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + drainErr: fmt.Errorf("test drain error"), + existingNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionFalse, + Reason: machinev1.PreservedByUser, + Message: "Machine preserved until " + preserveExpiryTime.String(), + }, + }, + expect: expect{ + newNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionFalse, + Reason: machinev1.PreservedByUser, + Message: fmt.Sprintf("%s Machine preserved until %v.", machinev1.PreservedNodeDrainUnsuccessful, preserveExpiryTime), + }, + needsUpdate: true, + }, + }), + Entry("when preserve=now, machine is Failed, drain is unsuccessful for the second time, existing condition present", &testCase{ + setup: setup{ + currentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: &metav1.Time{Time: time.Now().Add(2 * time.Hour)}, + }, + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + drainErr: fmt.Errorf("test drain error"), + existingNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionFalse, + Reason: machinev1.PreservedByUser, + Message: fmt.Sprintf("%s Machine preserved until %v.", machinev1.PreservedNodeDrainUnsuccessful, preserveExpiryTime), + }, + }, + expect: expect{ + newNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionFalse, + Reason: machinev1.PreservedByUser, + Message: fmt.Sprintf("%s Machine preserved until %v.", machinev1.PreservedNodeDrainUnsuccessful, preserveExpiryTime), + }, + needsUpdate: false, + }, + }), + Entry("when preserve=now, machine is Failed, drain is successful, existing condition present and status is true", &testCase{ + setup: setup{ + currentStatus: machinev1.CurrentStatus{ + Phase: machinev1.MachineFailed, + LastUpdateTime: metav1.Now(), + PreserveExpiryTime: &metav1.Time{Time: time.Now().Add(2 * time.Hour)}, + }, + preserveValue: machineutils.PreserveMachineAnnotationValueNow, + drainErr: nil, + existingNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + Message: fmt.Sprintf("%s Machine preserved until %v.", machinev1.PreservedNodeDrainSuccessful, preserveExpiryTime), + }, + }, + expect: expect{ + newNodeCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionTrue, + Reason: machinev1.PreservedByUser, + Message: fmt.Sprintf("%s Machine preserved until %v.", machinev1.PreservedNodeDrainSuccessful, preserveExpiryTime), + }, + needsUpdate: false, + }, + }), + ) + }) + Describe("#shouldPreservedNodeBeDrained", func() { + type setup struct { + machinePhase machinev1.MachinePhase + existingCondition *corev1.NodeCondition + } + type expect struct { + shouldDrain bool + } + type testCase struct { + setup setup + expect expect + } + + DescribeTable("#shouldPreservedNodeBeDrained behaviour scenarios", + func(tc *testCase) { + shouldDrain := shouldPreservedNodeBeDrained(tc.setup.existingCondition, tc.setup.machinePhase) + Expect(shouldDrain).To(Equal(tc.expect.shouldDrain)) + }, + Entry("should return false when machine is Running", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineRunning, + }, + expect: expect{ + shouldDrain: false, + }, + }), + Entry("should return true when machine is Failed and there is no existing node condition", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineFailed, + }, + expect: expect{ + shouldDrain: true, + }, + }), + Entry("should return true when machine is Failed and existing node condition message is PreservedNodeDrainUnsuccessful", &testCase{ + setup: setup{ + machinePhase: machinev1.MachineFailed, + existingCondition: &corev1.NodeCondition{ + Type: machinev1.NodePreserved, + Status: corev1.ConditionFalse, + Reason: machinev1.PreservedByUser, + Message: machinev1.PreservedNodeDrainUnsuccessful, + }, + }, + expect: expect{ + shouldDrain: true, + }, + }), + ) + }) + Describe("#removePreservationRelatedAnnotationsOnNode", func() { + type setup struct { + removePreserveAnnotation bool + CAAnnotationPresent bool + CAMCMAnnotationPresent bool + } + type expect struct { + err error + hasAnnotationKeys []string + deletedAnnotationKeys []string + } + type testCase struct { + setup setup + expect expect + } + DescribeTable("##removePreservationRelatedAnnotationsOnNode behaviour scenarios", + func(tc *testCase) { + stop := make(chan struct{}) + defer close(stop) + var targetCoreObjects []runtime.Object + node := &corev1.Node{ + ObjectMeta: metav1.ObjectMeta{ + Name: "node-1", + Annotations: map[string]string{ + machineutils.PreserveMachineAnnotationKey: machineutils.PreserveMachineAnnotationValueNow, + }, + }, + } + if tc.setup.CAAnnotationPresent { + node.Annotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey] = autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationValue + } + if tc.setup.CAMCMAnnotationPresent { + node.Annotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMKey] = autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMValue + } + targetCoreObjects = append(targetCoreObjects, node) + c, trackers := createController(stop, testNamespace, nil, nil, targetCoreObjects, nil, false) + defer trackers.Stop() + waitForCacheSync(stop, c) + err := c.removePreservationRelatedAnnotationsOnNode(context.TODO(), node, tc.setup.removePreserveAnnotation) + waitForCacheSync(stop, c) + updatedNode, getErr := c.targetCoreClient.CoreV1().Nodes().Get(context.TODO(), node.Name, metav1.GetOptions{}) + Expect(getErr).To(BeNil()) + if tc.expect.err != nil { + Expect(err).To(HaveOccurred()) + Expect(err.Error()).To(Equal(tc.expect.err.Error())) + } else { + Expect(err).To(BeNil()) + } + for _, key := range tc.expect.hasAnnotationKeys { + Expect(updatedNode.Annotations).To(HaveKey(key)) + } + for key := range tc.expect.deletedAnnotationKeys { + Expect(updatedNode.Annotations).NotTo(HaveKey(key)) + } + }, + Entry("when removePreserveAnnotation is true and ClusterAutoscalerScaleDownDisabledAnnotationByMCM annotation is present, should delete all preservation related annotations", &testCase{ + setup: setup{ + removePreserveAnnotation: true, + CAAnnotationPresent: true, + CAMCMAnnotationPresent: true, + }, + expect: expect{ + err: nil, + hasAnnotationKeys: []string{}, + deletedAnnotationKeys: []string{ + machineutils.PreserveMachineAnnotationKey, + autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey, + autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMKey, + }, + }, + }), + Entry("when removePreserveAnnotation is false and ClusterAutoscalerScaleDownDisabledAnnotationByMCM annotation is present, should delete only CA annotations ", &testCase{ + setup: setup{ + removePreserveAnnotation: false, + CAAnnotationPresent: true, + CAMCMAnnotationPresent: false, + }, + expect: expect{ + err: nil, + hasAnnotationKeys: []string{ + machineutils.PreserveMachineAnnotationKey, + }, + deletedAnnotationKeys: []string{ + autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey, + autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMKey, + }, + }, + }), + Entry("when removePreserveAnnotation is true and ClusterAutoscalerScaleDownDisabledAnnotationByMCM is not present, should delete only preserve annotation", &testCase{ + setup: setup{ + removePreserveAnnotation: true, + CAAnnotationPresent: true, + CAMCMAnnotationPresent: false, + }, + expect: expect{ + err: nil, + hasAnnotationKeys: []string{ + autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey, + }, + deletedAnnotationKeys: []string{ + machineutils.PreserveMachineAnnotationKey, + }, + }, + }), + Entry("when removePreserveAnnotation is false and ClusterAutoscalerScaleDownDisabledAnnotationByMCM is not present, should not delete any annotations", &testCase{ + setup: setup{ + removePreserveAnnotation: false, + CAAnnotationPresent: true, + CAMCMAnnotationPresent: false, + }, + expect: expect{ + err: nil, + hasAnnotationKeys: []string{ + autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey, + machineutils.PreserveMachineAnnotationKey, + }, + }, + }), + ) + }) }) diff --git a/pkg/util/provider/machinecontroller/node.go b/pkg/util/provider/machinecontroller/node.go index 7afa8991f..e701bc146 100644 --- a/pkg/util/provider/machinecontroller/node.go +++ b/pkg/util/provider/machinecontroller/node.go @@ -10,6 +10,7 @@ import ( "encoding/json" "errors" "fmt" + "github.com/gardener/machine-controller-manager/pkg/controller/autoscaler" "time" "github.com/gardener/machine-controller-manager/pkg/apis/machine/v1alpha1" @@ -100,6 +101,11 @@ func (c *controller) updateNode(oldObj, newObj any) { if nodeConditionsHaveChanged && !(isMachineCrashLooping || isMachineTerminating) { c.enqueueMachine(machine, fmt.Sprintf("handling node UPDATE event. Conditions of node %q differ from machine status", node.Name)) } + // to reconcile on change in annotations related to preservation + if node.Annotations[machineutils.PreserveMachineAnnotationKey] != oldNode.Annotations[machineutils.PreserveMachineAnnotationKey] { + c.enqueueMachine(machine, fmt.Sprintf("handling node UPDATE event. Preserve annotations added or updated for node %q", getNodeName(machine))) + return + } } func (c *controller) deleteNode(obj any) { @@ -333,3 +339,81 @@ func addedOrRemovedEssentialTaints(oldNode, node *corev1.Node, taintKeys []strin } return false } + +func (c *controller) getNodePreserveAnnotationValue(nodeName string) (string, error) { + node, err := c.nodeLister.Get(nodeName) + if err != nil { + klog.Errorf("error fetching node %q: %v", nodeName, err) + return "", err + } + return node.Annotations[machineutils.PreserveMachineAnnotationKey], nil +} + +func (c *controller) uncordonNodeIfCordoned(ctx context.Context, nodeName string) error { + node, err := c.nodeLister.Get(nodeName) + if err != nil { + return err + } + if !node.Spec.Unschedulable { + return nil + } + nodeClone := node.DeepCopy() + nodeClone.Spec.Unschedulable = false + _, err = c.targetCoreClient.CoreV1().Nodes().Update(ctx, nodeClone, metav1.UpdateOptions{}) + if err != nil { + klog.Errorf("error uncordoning node %q: %v", nodeName, err) + } + return err +} + +// removePreservationRelatedAnnotationsOnNode removes the cluster-autoscaler annotation that disables scale down of preserved node +func (c *controller) removePreservationRelatedAnnotationsOnNode(ctx context.Context, node *corev1.Node, removePreserveAnnotation bool) error { + // Check if annotation already absent + if node.Annotations == nil { + return nil + } + updateRequired := false + nodeCopy := node.DeepCopy() + // If CA scale-down disabled annotation was added by MCM, it can be safely removed. + // If the annotation was added by some other entity, then it should not be removed. + if nodeCopy.Annotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMKey] == autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMValue { + delete(nodeCopy.Annotations, autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey) + delete(nodeCopy.Annotations, autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMKey) + updateRequired = true + } + if removePreserveAnnotation && nodeCopy.Annotations[machineutils.PreserveMachineAnnotationKey] != "" { + delete(nodeCopy.Annotations, machineutils.PreserveMachineAnnotationKey) + updateRequired = true + } + if !updateRequired { + return nil + } + _, err := c.targetCoreClient.CoreV1().Nodes().Update(ctx, nodeCopy, metav1.UpdateOptions{}) + if err != nil { + klog.Errorf("node UPDATE failed for node %q. Retrying, error: %s", node.Name, err) + return err + } + return nil +} + +// addCAScaleDownDisabledAnnotationOnNode adds the cluster-autoscaler annotation to disable scale down of preserved node +func (c *controller) addCAScaleDownDisabledAnnotationOnNode(ctx context.Context, node *corev1.Node) (*corev1.Node, error) { + // Check if annotation already exists with correct value + if node.Annotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey] == autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationValue { + return node, nil + } + // Add annotation to disable CA scale down. + // Also add annotation expressing that MCM is the one who added this annotation, so that it can be removed safely when preservation is stopped. + nodeCopy := node.DeepCopy() + if node.Annotations == nil { + nodeCopy.Annotations = make(map[string]string) + } + nodeCopy.Annotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey] = autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationValue + nodeCopy.Annotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMKey] = autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationByMCMValue + updatedNode, err := c.targetCoreClient.CoreV1().Nodes().Update(ctx, nodeCopy, metav1.UpdateOptions{}) + if err != nil { + klog.Errorf("error trying to update CA annotation on node %q: %v", node.Name, err) + return nil, err + } + return updatedNode, nil +} diff --git a/pkg/util/provider/machineutils/utils.go b/pkg/util/provider/machineutils/utils.go index 141e3a09c..9755d1c0e 100644 --- a/pkg/util/provider/machineutils/utils.go +++ b/pkg/util/provider/machineutils/utils.go @@ -6,11 +6,17 @@ package machineutils import ( - "time" - - v1 "k8s.io/api/core/v1" - + "context" "github.com/gardener/machine-controller-manager/pkg/apis/machine/v1alpha1" + v1alpha1client "github.com/gardener/machine-controller-manager/pkg/client/clientset/versioned/typed/machine/v1alpha1" + v1alpha1listers "github.com/gardener/machine-controller-manager/pkg/client/listers/machine/v1alpha1" + v1 "k8s.io/api/core/v1" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + errorsutil "k8s.io/apimachinery/pkg/util/errors" + "k8s.io/apimachinery/pkg/util/sets" + "k8s.io/client-go/util/retry" + "k8s.io/klog/v2" + "time" ) const ( @@ -81,8 +87,40 @@ const ( // LabelKeyMachineSetScaleUpDisabled is the label key that indicates scaling up of the machine set is disabled. LabelKeyMachineSetScaleUpDisabled = "node.machine.sapcloud.io/scale-up-disabled" + + // PreserveMachineAnnotationKey is the annotation used to explicitly request that a Machine be preserved + PreserveMachineAnnotationKey = "node.machine.sapcloud.io/preserve" + + // LastAppliedNodePreserveValueAnnotationKey is the annotation used to store the last preserve value applied by MCM + LastAppliedNodePreserveValueAnnotationKey = "node.machine.sapcloud.io/last-applied-node-preserve-value" + + // PreserveMachineAnnotationValueNow is the annotation value used to explicitly request that + // a Machine be preserved immediately, irrespective of its current phase, and its phase is not changed + // on preservation + PreserveMachineAnnotationValueNow = "now" + + // PreserveMachineAnnotationValueWhenFailed is the annotation value used to explicitly request that + // a Machine be preserved if and when it enters Failed phase + PreserveMachineAnnotationValueWhenFailed = "when-failed" + + // PreserveMachineAnnotationValuePreservedByMCM is the annotation value used by the machineset controller to + // indicate to the machine controller that the machine must be auto-preserved. + // The AutoPreserveFailedMachineMax, set on the MCD, is enforced based on the number of machines annotated with this value. + PreserveMachineAnnotationValuePreservedByMCM = "auto-preserved" + + // PreserveMachineAnnotationValueFalse is the annotation value used to + // 1) indicate to MCM that a machine must not be auto-preserved on failure + // and, 2) to stop auto-preservation of a machine that is already auto-preserved by MCM. + PreserveMachineAnnotationValueFalse = "false" ) +// AllowedPreserveAnnotationValues contains the allowed values for the preserve annotation +var AllowedPreserveAnnotationValues = sets.New(PreserveMachineAnnotationValueNow, PreserveMachineAnnotationValueWhenFailed, PreserveMachineAnnotationValuePreservedByMCM, PreserveMachineAnnotationValueFalse, "") + +// PreventAutoPreserveAnnotationValues contains the values to check if a machine is already annotated for preservation, +// in which case it should not be auto-preserved. +var PreventAutoPreserveAnnotationValues = sets.New(PreserveMachineAnnotationValueNow, PreserveMachineAnnotationValueWhenFailed, PreserveMachineAnnotationValuePreservedByMCM, PreserveMachineAnnotationValueFalse) + // RetryPeriod is an alias for specifying the retry period type RetryPeriod time.Duration @@ -124,3 +162,41 @@ func IsMachineFailed(p *v1alpha1.Machine) bool { func IsMachineTriggeredForDeletion(m *v1alpha1.Machine) bool { return m.Annotations[MachinePriority] == "1" } + +// IsMachinePreservationExpired checks if the preserve expiry time has passed for a machine +func IsMachinePreservationExpired(m *v1alpha1.Machine) bool { + t := m.Status.CurrentStatus.PreserveExpiryTime + return t != nil && !t.After(time.Now()) +} + +// see https://github.com/kubernetes/kubernetes/issues/21479 +type updateMachineFunc func(machine *v1alpha1.Machine) error + +// UpdateMachineWithRetries updates a machine with given applyUpdate function. Note that machine not found error is ignored. +func UpdateMachineWithRetries(ctx context.Context, machineClient v1alpha1client.MachineInterface, machineLister v1alpha1listers.MachineLister, namespace, name string, applyUpdate updateMachineFunc) (*v1alpha1.Machine, error) { + var machine *v1alpha1.Machine + + retryErr := retry.RetryOnConflict(retry.DefaultBackoff, func() error { + var err error + machine, err = machineLister.Machines(namespace).Get(name) + if err != nil { + return err + } + machine = machine.DeepCopy() + // Apply the update, then attempt to push it to the apiserver. + if applyErr := applyUpdate(machine); applyErr != nil { + return applyErr + } + machine, err = machineClient.Update(ctx, machine, metav1.UpdateOptions{}) + return err + }) + + // Ignore the precondition violated error, this machine is already updated + // with the desired label. + if retryErr == errorsutil.ErrPreconditionViolated { + klog.V(4).Infof("Machine %s precondition doesn't hold, skip updating it.", name) + retryErr = nil + } + + return machine, retryErr +} diff --git a/pkg/util/provider/options/types.go b/pkg/util/provider/options/types.go index d1be4c2c9..08ff30da7 100644 --- a/pkg/util/provider/options/types.go +++ b/pkg/util/provider/options/types.go @@ -97,6 +97,9 @@ type SafetyOptions struct { PvDetachTimeout metav1.Duration // Timeout (in duration) used while waiting for PV to reattach on new node PvReattachTimeout metav1.Duration + // Timeout (in duration) used while preserving a machine, + // beyond which preservation is stopped + MachinePreserveTimeout metav1.Duration // Timeout (in duration) for which the APIServer can be down before // declare the machine controller frozen by safety controller