-
Notifications
You must be signed in to change notification settings - Fork 135
Description
How to categorize this issue?
/area auto-scaling
/kind bug
/priority 3
What happened:
We recently identified a bug where when there are multiple sequential MCD replica scale downs by an external actor (for example cluster-autoscaler), we noticed that the MCS controller repeatedly picks the same machine for deletion.
This causes problems, specially with how the external actor processed the scale down. For example, when using cluster-autoscaler with the mcm provider, cluster-autoscaler will cordon nodes that it has marked for termination. This leads to cluster states with multiple cordoned nodes that that are not deleted.
What you expected to happen:
When selecting a node for deletion due to reducing replicas, MCM should not select nodes already in a terminating state.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version): - Cloud provider or hardware configuration:
- Others: