Skip to content

Conversation

@seanmcguire12
Copy link
Member

@seanmcguire12 seanmcguire12 commented Jan 5, 2026

why

  • when cached act & agent runs fail and need to rerun inference (self heal), the default model was getting used every time. it should have been using the original model that was defined by the user.

what changed

  • added logic to pass the the resolved LLMClient from Stagehand.act() & Stagehand.agent() into ActCache.tryReplay & AgentCache.tryReplay(AsStream) so cache hits reuse the same client as their original runs and only fall back to the default when no override is provided

test plan

  • added unit tests which confirm that act & agent caches use the provided override client and that non LLM steps (like goto) replay successfully without requiring one

Summary by cubic

Fixes a bug where cache replay used the default model after an action failed; agent and act replays now use the requested model when inference is rerun. Addresses Linear STG-930.

  • Bug Fixes
    • Pass the resolved LLM client (from options.model) into ActCache and AgentCache replay paths, including streaming.
    • Use the effective client when re-executing act and fillForm steps during replay.
    • Keep normal cache hits unchanged; only reruns use the specified model.

Written for commit 615c997. Summary will update on new commits.

@changeset-bot
Copy link

changeset-bot bot commented Jan 5, 2026

🦋 Changeset detected

Latest commit: 615c997

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@seanmcguire12
Copy link
Member Author

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile Summary

Fixes a bug where cached act() and agent() executions incorrectly used the default model instead of the user-specified model when cache replay failed and needed to re-run inference for self-healing.

  • Added llmClientOverride parameter to ActCache.tryReplay() and AgentCache.tryReplay() methods
  • Modified V3.act() to resolve the LLM client from options.model and pass it to cache replay
  • Updated V3.prepareAgentExecution() to return the resolved llmClient and pass it to both streaming and non-streaming agent cache replay paths
  • The resolved client is threaded through replayCachedActions(), replayAgentCacheEntry(), executeAgentReplayStep(), and down to takeDeterministicAction() calls
  • Normal cache hits continue to work unchanged; only failed actions that trigger self-heal inference now use the correct model
  • Comprehensive unit tests confirm the fix works for both act and agent caches, including validation that non-LLM steps (like goto) replay successfully without requiring a client

Confidence Score: 5/5

  • This PR is safe to merge with no identified issues
  • The implementation correctly threads the LLM client through all cache replay paths, the logic is sound, comprehensive unit tests validate the behavior, and the changes are backward-compatible (falling back to default client when no override is provided)
  • No files require special attention

Important Files Changed

Filename Overview
packages/core/lib/v3/cache/ActCache.ts Added llmClientOverride parameter to tryReplay and replayCachedActions methods to use correct model during cache replay with self-heal
packages/core/lib/v3/cache/AgentCache.ts Added llmClientOverride parameter throughout agent cache replay chain to use correct model when cached actions fail and need re-execution
packages/core/lib/v3/v3.ts Modified act method and prepareAgentExecution to resolve and pass LLM client to cache replay methods, ensuring user-specified model is used during cache self-heal
packages/core/tests/cache-llm-resolution.test.ts New comprehensive unit tests validating that ActCache and AgentCache use override LLM client during replay and that non-LLM steps work without client

Sequence Diagram

sequenceDiagram
    participant User
    participant V3
    participant ActCache
    participant AgentCache
    participant ActHandler
    participant LLM

    Note over User,LLM: Act Cache Flow
    User->>V3: act(instruction, {model: "custom-model"})
    V3->>V3: resolveLlmClient(options.model)
    V3->>ActCache: prepareContext(instruction, page)
    ActCache-->>V3: actCacheContext
    V3->>ActCache: tryReplay(context, page, timeout, llmClient)
    ActCache->>ActCache: replayCachedActions(context, entry, page, timeout, llmClient)
    ActCache->>ActHandler: takeDeterministicAction(action, page, timeout, llmClient)
    alt Action succeeds
        ActHandler-->>ActCache: success
    else Action fails (self-heal)
        ActHandler->>LLM: Use llmClient (not default!)
        LLM-->>ActHandler: New action
        ActHandler-->>ActCache: Updated action
    end
    ActCache-->>V3: ActResult
    V3-->>User: Result with correct model used

    Note over User,LLM: Agent Cache Flow
    User->>V3: agent({model: "custom-model"}).execute(instruction)
    V3->>V3: prepareAgentExecution(options)
    V3->>V3: resolveLlmClient(options.model)
    V3->>AgentCache: tryReplay(context, llmClient)
    AgentCache->>AgentCache: replayAgentCacheEntry(context, entry, llmClient)
    loop Each step
        AgentCache->>AgentCache: executeAgentReplayStep(step, ctx, handler, llmClient)
        alt Act/FillForm step
            AgentCache->>ActHandler: takeDeterministicAction(action, page, timeout, llmClient)
            alt Action fails (self-heal)
                ActHandler->>LLM: Use llmClient (not default!)
                LLM-->>ActHandler: New action
            end
        else Goto/Scroll/Wait step
            Note over AgentCache: No LLM needed
        end
    end
    AgentCache-->>V3: AgentResult
    V3-->>User: Result with correct model used
Loading

@seanmcguire12 seanmcguire12 marked this pull request as ready for review January 5, 2026 23:05
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".changeset/mean-melons-repeat.md">

<violation number="1" location=".changeset/mean-melons-repeat.md:5">
P2: Changeset message describes the bug behavior rather than the fix. The description says it &#39;uses default model&#39; but the actual fix makes cached replays use the *originally specified model* instead of the default. Consider rewording to clarify the fix, e.g., &quot;fix: replaying cached actions (for agent &amp; act) now uses the originally-specified model instead of default when rerunning inference after failure&quot;</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

"@browserbasehq/stagehand": patch
---

fix: replaying cached actions (for agent & act) uses default model when action fails and rerunning inference is needed
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Changeset message describes the bug behavior rather than the fix. The description says it 'uses default model' but the actual fix makes cached replays use the originally specified model instead of the default. Consider rewording to clarify the fix, e.g., "fix: replaying cached actions (for agent & act) now uses the originally-specified model instead of default when rerunning inference after failure"

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .changeset/mean-melons-repeat.md, line 5:

<comment>Changeset message describes the bug behavior rather than the fix. The description says it &#39;uses default model&#39; but the actual fix makes cached replays use the *originally specified model* instead of the default. Consider rewording to clarify the fix, e.g., &quot;fix: replaying cached actions (for agent &amp; act) now uses the originally-specified model instead of default when rerunning inference after failure&quot;</comment>

<file context>
@@ -0,0 +1,5 @@
+&quot;@browserbasehq/stagehand&quot;: patch
+---
+
+fix: replaying cached actions (for agent &amp; act) uses default model when action fails and rerunning inference is needed
</file context>

✅ Addressed in 2240a0d

@seanmcguire12 seanmcguire12 force-pushed the seanmcguire/stg-930-bug-replaying-cached-agent-actions-uses-default-model branch from 2240a0d to 615c997 Compare January 5, 2026 23:10
@seanmcguire12 seanmcguire12 merged commit 088c4cc into main Jan 5, 2026
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants