fix: rotate pods on pod-config change#2299
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
How to use the Graphite Merge QueueAdd the label main-merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has required the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
Graphite Automations"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (02/26/26)3 reviewers were added to this PR based on Anton Bykov's automation. |
9fc542d to
d3da503
Compare
5bf5f3d to
cf93221
Compare
| } | ||
|
|
||
| func (r *containerReconcilerLoop) deletePodOnConfigVersionMismatch(ctx context.Context) error { | ||
| ctx, logger, end := instrumentation.GetLogSpan(ctx, "deletePodOnConfigVersionMismatch") |
There was a problem hiding this comment.
nit: use can use empty string for span to avoid creating duplicated one (deletePodOnConfigVersionMismatch is created in step engine as it's SimpleStep)
| ctx, logger, end := instrumentation.GetLogSpan(ctx, "deletePodOnConfigVersionMismatch") | |
| ctx, logger, end := instrumentation.GetLogSpan(ctx, "deletePodOnConfigVersionMismatch") |
| mode := r.container.Spec.Mode | ||
| if mode == weka.WekaContainerModeDiscovery || | ||
| mode == weka.WekaContainerModeDriversDist || | ||
| mode == weka.WekaContainerModeDriversLoader || | ||
| mode == weka.WekaContainerModeDriversBuilder { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
maybe let's use r.container.IsServiceContainer() ?
| siblings, err := r.getClusterContainers(ctx) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get cluster containers: %w", err) | ||
| } |
There was a problem hiding this comment.
we cannot always get cluster containers
it will work only if current container has owner - wekacluster
| // If any sibling is unstable, defer rotation | ||
| status := sibling.Status.Status | ||
| if (status == weka.PodTerminating || status == weka.PodNotRunning || status == weka.Init) && sibling.DeletionTimestamp == nil { | ||
| logger.Info("Deferring pod config rotation: sibling container is unstable", | ||
| "sibling", sibling.Name, | ||
| "siblingStatus", status, | ||
| ) | ||
| return lifecycle.NewWaitError(fmt.Errorf("deferring config rotation: sibling %s is in %s state", sibling.Name, status)) | ||
| } |
There was a problem hiding this comment.
we might have cases when some wekacontainers are stuck in PodTerminating, PodNotRunning, Deleting (because of node being not ready or not enough resources on nodes to create pod)
Not sure it's possible in real env to get all pods stable
| if exists && podHash == currentHash { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
we could check it earlier, in step predicates (including simple checks before this one)
cf93221 to
8576c16
Compare
d3da503 to
fda9371
Compare
fda9371 to
5ae7f5f
Compare

TL;DR
Added automatic pod rotation when configuration version changes to ensure pods run with the latest configuration.
What changed?
deletePodOnConfigVersionMismatchto the active state flow that checks for configuration version mismatches between running pods and the current configurationHow to test?
Why make this change?
This ensures that running pods automatically pick up configuration changes without manual intervention. The priority-based rotation system maintains cluster stability by rotating critical components (drives) first and preventing multiple simultaneous rotations that could impact service availability.