Skip to content

Revert shell exec behavior change#637

Merged
mnovak merged 1 commit intoxtf-cz:mainfrom
laDok8:podshell_bugfix
Feb 2, 2026
Merged

Revert shell exec behavior change#637
mnovak merged 1 commit intoxtf-cz:mainfrom
laDok8:podshell_bugfix

Conversation

@laDok8
Copy link
Contributor

@laDok8 laDok8 commented Jan 30, 2026

Change of behavior was introduced in #634. This PR reverts that change and realigns executeWithRetry with the existing behavior of execute, which returns as soon as either stdout or stderr is present.

In the current implementation, setting executionDone.set(true) in onFailure causes fail-fast behavior. This breaks startup scenarios (for example, when a pod is still initializing). Previously, there was a one-minute wait window during which the pod could become ready and execute successfully.

Summary by Sourcery

Revert the recent change to pod shell execution retry behavior to again wait for meaningful command output instead of failing fast during startup.

New Features:

  • Add a helper method to execute pod shell commands with bash using the retry mechanism.

Bug Fixes:

  • Restore executeWithRetry semantics so it returns once either stdout or stderr is available, matching the non-retry execute behavior and avoiding premature failures during pod startup.

Enhancements:

  • Adjust retry loop to remember the last successful execution result and only throw when all attempts fail with exceptions.
  • Prevent the WebSocket onFailure callback from immediately marking execution as done to avoid interrupting valid startup retries.

@sourcery-ai
Copy link

sourcery-ai bot commented Jan 30, 2026

Reviewer's Guide

Reverts the recent fail-fast behavior in pod shell command execution, restores executeWithRetry semantics to match execute (returning when either stdout or stderr is available), adds a bash-specific retry helper, and adjusts retry loop and websocket failure handling to allow startup grace periods.

Sequence diagram for executeWithRetry and websocket failure handling

sequenceDiagram
  actor Caller
  participant PodShell
  participant PodWebSocketListener

  Caller->>PodShell: executeWithRetry(retryCount, retryDelay, commands)
  loop attempts 1..retryCount
    PodShell->>PodShell: execute(commands)
    alt stdout or stderr present
      PodShell-->>Caller: PodShellOutput
    else WaiterException thrown
      PodShell->>PodShell: log warn and continue
    end

    alt attempt < retryCount
      PodShell->>PodShell: Thread.sleep(retryDelay)
    end
  end

  alt no attempt produced output or error and no success
    PodShell-->>Caller: throws WaiterException
  end

  PodWebSocketListener-->>PodShell: onFailure(throwable, response)
  PodShell->>PodShell: ignore failure (do not set executionDone)
Loading

Class diagram for updated PodShell shell execution behavior

classDiagram
  class PodShell {
    +PodShellOutput executeWithBash(String command)
    +PodShellOutput executeWithBashRetry(String command)
    +PodShellOutput execute(String commands)
    +PodShellOutput executeWithRetry(int retryCount, int retryDelay, String commands)
  }

  class PodWebSocketListener {
    +void onOpen()
    +void onFailure(Throwable throwable, Response response)
  }

  PodShell --> PodWebSocketListener : uses
Loading

File-Level Changes

Change Details Files
Restore executeWithRetry semantics to return when either stdout or stderr is produced, while tracking the last successful attempt.
  • Introduce lastSuccessfulResult variable to keep track of the most recent successful PodShellOutput from execute
  • Update success condition to return as soon as either non-empty stdout or non-empty stderr is present
  • After exhausting retries, return lastSuccessfulResult if any attempt succeeded but produced empty output and error, otherwise throw WaiterException
core/src/main/java/cz/xtf/core/openshift/PodShell.java
Normalize retry loop timing behavior to always sleep between attempts, independent of exception type.
  • Move retry sleep logic out of the catch block so it runs whenever an attempt finishes and more retries remain
  • Preserve InterruptedException handling by re-setting the interrupt flag and throwing a RuntimeException
core/src/main/java/cz/xtf/core/openshift/PodShell.java
Prevent fail-fast behavior on WebSocket failure to allow pods time to become ready.
  • Change onFailure callback to no longer mark executionDone as true when a WebSocket error occurs
  • Retain a comment explaining that setting executionDone in onFailure would cause fail-fast behavior during pod startup
core/src/main/java/cz/xtf/core/openshift/PodShell.java
Add a bash-specific convenience method that uses the retrying execution path.
  • Introduce executeWithBashRetry helper that delegates to executeWithRetry with "bash -c"
  • Keep existing executeWithBash implementation untouched for non-retrying executions
core/src/main/java/cz/xtf/core/openshift/PodShell.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In onFailure, completely suppressing both the log and the executionDone change may make debugging connection/exec issues harder; consider keeping at least a warning log while still avoiding fail-fast behavior.
  • In executeWithRetry, the final RuntimeException thrown on InterruptedException loses the original cause; consider including the InterruptedException as the cause to aid troubleshooting.
  • The repeated checks for non-empty stdout/stderr in executeWithRetry could be factored into a small helper method to keep the retry loop logic simpler and more readable.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `onFailure`, completely suppressing both the log and the `executionDone` change may make debugging connection/exec issues harder; consider keeping at least a warning log while still avoiding fail-fast behavior.
- In `executeWithRetry`, the final `RuntimeException` thrown on `InterruptedException` loses the original cause; consider including the `InterruptedException` as the cause to aid troubleshooting.
- The repeated checks for non-empty stdout/stderr in `executeWithRetry` could be factored into a small helper method to keep the retry loop logic simpler and more readable.

## Individual Comments

### Comment 1
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:87-96` </location>
<code_context>
             } catch (WaiterException e) {
</code_context>

<issue_to_address>
**suggestion:** Final WaiterException loses information about the last underlying failure

With `lastSuccessfulResult` you now throw a generic `WaiterException` whenever no attempt succeeded (`lastSuccessfulResult == null`), losing the details of the last underlying `WaiterException`. Consider keeping a `lastException` and either rethrowing it or setting it as the cause of the final `WaiterException` so callers still see the specific failure reason.

Suggested implementation:

```java
            } catch (WaiterException e) {
                // If this was the last attempt, rethrow the underlying exception so callers
                // can see the specific failure reason instead of a generic WaiterException.
                if (attempt >= retryCount) {
                    throw e;
                }

                log.warn("Attempt {}/{} failed for command execution on pod {}: {}",
                        attempt, retryCount, podName, e.getMessage());
            }

```

This change assumes:
1. The surrounding method either already declares `throws WaiterException` or handles it at a higher level.
2. `attempt` and `retryCount` are 1-based counters such that `attempt >= retryCount` correctly identifies the final attempt. If they are 0-based or calculated differently, adjust the condition accordingly (e.g. `attempt + 1 >= retryCount`).
If there is still a generic `WaiterException` being thrown elsewhere when `lastSuccessfulResult == null`, you may want to remove or simplify that logic since the last underlying `WaiterException` will now be rethrown on the final attempt.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@laDok8
Copy link
Contributor Author

laDok8 commented Jan 30, 2026

FYI i removed outdated Build and CheckStyle (8) branch protection rule so CI checks don't hang Build and CheckStyle (11) is present

@mnovak mnovak merged commit 973d9b9 into xtf-cz:main Feb 2, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants