Skip to content

Missing "name" label brakes rate limiting in canMarkMachineFailed function #1077

@v1dusss

Description

@v1dusss

How to categorize this issue?

/area control-plane
/area robustness
/kind bug
/priority 3

What happened:
The mcm has a safety mechanism to prevent marking too many Unknown machines as Failed simultaneously, handled by the canMarkMachineFailed function.
However, this rate-limit hardcodes a label selector to look for the key "name" matching the MachineDeployment name:

	var (
		list     = []string{machineDeployName}
		selector = labels.NewSelector()
		req, _   = labels.NewRequirement("name", selection.Equals, list)
	)

(Reference: pkg/util/provider/machinecontroller/machine_util.go#L2138-L2142)

We use MCM in a few "non-gardener" clusters and therefore create the MachineDeployment objects on our own.

If a MachineDeployment omits this "name" label in spec.template.metadata.labels, the selector finds zero machines. The inProgress counter remains 0, entirely bypassing the maxReplacements check (0 < maxReplacements is always true). Consequently, if nodes go Unknown for >10min, MCM marks all of them as Failed simultaneously, ignoring the safety limits.

What you expected to happen:
The function should accurately identify machines regardless of the "name" label. Better approaches include:

  • Looking up machines via OwnerReferences (Machine -> MachineSet -> MachineDeployment).
  • Alternatively: Enforce the "name" label via validation webhook upon creation if it is strictly required for internal logic

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:
MCM: v0.61.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/control-planeControl plane relatedarea/robustnessRobustness, reliability, resilience relatedkind/bugBugpriority/3Priority (lower number equals higher priority)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions