Skip to content

Conversation

@JoshuaGabriel
Copy link
Collaborator

reads the failure domain of a pool then upmaps the backfill_toofull pg into another OSD based on %utilization

@JoshuaGabriel JoshuaGabriel requested a review from sam0044 November 4, 2025 06:59
Signed-off-by: Joshua Blanch <joshua.blanch@clyso.com>
```

Problem:
Usually when a node goes down or when draining capacity, there are some OSDs that become nearfull and eventually can lead to PGs being backfill_toofull warning pops up.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually when a node goes down or when draining capacity seems broken. Should it be Usually when a node goes down or when draining with limited capacity?

Copy link
Collaborator

@sam0044 sam0044 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice script to have. I would rephrase the problem statement to be a bit more clear. Outside that, the logic looks pretty solid.

@JoshuaGabriel
Copy link
Collaborator Author

actually I don't think this will take into account device class for the crush rule, only tried this on all nvme cluster. If there were mixed hdd/ssd it could create an upmap to one outside its device class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants