Skip to content

Multiple sequential MCD replica reduction repeatedly picks the same machine for deletion #1084

@aaronfern

Description

@aaronfern

How to categorize this issue?

/area auto-scaling
/kind bug
/priority 3

What happened:
We recently identified a bug where when there are multiple sequential MCD replica scale downs by an external actor (for example cluster-autoscaler), we noticed that the MCS controller repeatedly picks the same machine for deletion.
This causes problems, specially with how the external actor processed the scale down. For example, when using cluster-autoscaler with the mcm provider, cluster-autoscaler will cordon nodes that it has marked for termination. This leads to cluster states with multiple cordoned nodes that that are not deleted.

What you expected to happen:
When selecting a node for deletion due to reducing replicas, MCM should not select nodes already in a terminating state.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:

Metadata

Metadata

Assignees

Labels

area/auto-scalingAuto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) relatedkind/bugBugpriority/1Priority (lower number equals higher priority)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions