[fix]: use correct model on cache replay failure #1498

seanmcguire12 · 2026-01-05T21:18:35Z

why

when cached act & agent runs fail and need to rerun inference (self heal), the default model was getting used every time. it should have been using the original model that was defined by the user.

what changed

added logic to pass the the resolved LLMClient from Stagehand.act() & Stagehand.agent() into ActCache.tryReplay & AgentCache.tryReplay(AsStream) so cache hits reuse the same client as their original runs and only fall back to the default when no override is provided

test plan

added unit tests which confirm that act & agent caches use the provided override client and that non LLM steps (like goto) replay successfully without requiring one

Summary by cubic

Fixes a bug where cache replay used the default model after an action failed; agent and act replays now use the requested model when inference is rerun. Addresses Linear STG-930.

Bug Fixes
- Pass the resolved LLM client (from options.model) into ActCache and AgentCache replay paths, including streaming.
- Use the effective client when re-executing act and fillForm steps during replay.
- Keep normal cache hits unchanged; only reruns use the specified model.

^{Written for commit 615c997. Summary will update on new commits.}

changeset-bot · 2026-01-05T21:18:38Z

🦋 Changeset detected

Latest commit: 615c997

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@browserbasehq/stagehand	Patch
@browserbasehq/stagehand-evals	Patch
@browserbasehq/stagehand-server	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

seanmcguire12 · 2026-01-05T21:21:30Z

@greptileai

greptile-apps · 2026-01-05T21:23:30Z

Greptile Summary

Fixes a bug where cached act() and agent() executions incorrectly used the default model instead of the user-specified model when cache replay failed and needed to re-run inference for self-healing.

Added llmClientOverride parameter to ActCache.tryReplay() and AgentCache.tryReplay() methods
Modified V3.act() to resolve the LLM client from options.model and pass it to cache replay
Updated V3.prepareAgentExecution() to return the resolved llmClient and pass it to both streaming and non-streaming agent cache replay paths
The resolved client is threaded through replayCachedActions(), replayAgentCacheEntry(), executeAgentReplayStep(), and down to takeDeterministicAction() calls
Normal cache hits continue to work unchanged; only failed actions that trigger self-heal inference now use the correct model
Comprehensive unit tests confirm the fix works for both act and agent caches, including validation that non-LLM steps (like goto) replay successfully without requiring a client

Confidence Score: 5/5

This PR is safe to merge with no identified issues
The implementation correctly threads the LLM client through all cache replay paths, the logic is sound, comprehensive unit tests validate the behavior, and the changes are backward-compatible (falling back to default client when no override is provided)
No files require special attention

Important Files Changed

Filename	Overview
packages/core/lib/v3/cache/ActCache.ts	Added `llmClientOverride` parameter to `tryReplay` and `replayCachedActions` methods to use correct model during cache replay with self-heal
packages/core/lib/v3/cache/AgentCache.ts	Added `llmClientOverride` parameter throughout agent cache replay chain to use correct model when cached actions fail and need re-execution
packages/core/lib/v3/v3.ts	Modified `act` method and `prepareAgentExecution` to resolve and pass LLM client to cache replay methods, ensuring user-specified model is used during cache self-heal
packages/core/tests/cache-llm-resolution.test.ts	New comprehensive unit tests validating that ActCache and AgentCache use override LLM client during replay and that non-LLM steps work without client

Sequence Diagram

sequenceDiagram
    participant User
    participant V3
    participant ActCache
    participant AgentCache
    participant ActHandler
    participant LLM

    Note over User,LLM: Act Cache Flow
    User->>V3: act(instruction, {model: "custom-model"})
    V3->>V3: resolveLlmClient(options.model)
    V3->>ActCache: prepareContext(instruction, page)
    ActCache-->>V3: actCacheContext
    V3->>ActCache: tryReplay(context, page, timeout, llmClient)
    ActCache->>ActCache: replayCachedActions(context, entry, page, timeout, llmClient)
    ActCache->>ActHandler: takeDeterministicAction(action, page, timeout, llmClient)
    alt Action succeeds
        ActHandler-->>ActCache: success
    else Action fails (self-heal)
        ActHandler->>LLM: Use llmClient (not default!)
        LLM-->>ActHandler: New action
        ActHandler-->>ActCache: Updated action
    end
    ActCache-->>V3: ActResult
    V3-->>User: Result with correct model used

    Note over User,LLM: Agent Cache Flow
    User->>V3: agent({model: "custom-model"}).execute(instruction)
    V3->>V3: prepareAgentExecution(options)
    V3->>V3: resolveLlmClient(options.model)
    V3->>AgentCache: tryReplay(context, llmClient)
    AgentCache->>AgentCache: replayAgentCacheEntry(context, entry, llmClient)
    loop Each step
        AgentCache->>AgentCache: executeAgentReplayStep(step, ctx, handler, llmClient)
        alt Act/FillForm step
            AgentCache->>ActHandler: takeDeterministicAction(action, page, timeout, llmClient)
            alt Action fails (self-heal)
                ActHandler->>LLM: Use llmClient (not default!)
                LLM-->>ActHandler: New action
            end
        else Goto/Scroll/Wait step
            Note over AgentCache: No LLM needed
        end
    end
    AgentCache-->>V3: AgentResult
    V3-->>User: Result with correct model used

cubic-dev-ai

1 issue found across 6 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".changeset/mean-melons-repeat.md">

<violation number="1" location=".changeset/mean-melons-repeat.md:5">
P2: Changeset message describes the bug behavior rather than the fix. The description says it &#39;uses default model&#39; but the actual fix makes cached replays use the *originally specified model* instead of the default. Consider rewording to clarify the fix, e.g., &quot;fix: replaying cached actions (for agent &amp; act) now uses the originally-specified model instead of default when rerunning inference after failure&quot;</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-01-05T23:07:48Z

.changeset/mean-melons-repeat.md

+"@browserbasehq/stagehand": patch
+---
+
+fix: replaying cached actions (for agent & act) uses default model when action fails and rerunning inference is needed


P2: Changeset message describes the bug behavior rather than the fix. The description says it 'uses default model' but the actual fix makes cached replays use the originally specified model instead of the default. Consider rewording to clarify the fix, e.g., "fix: replaying cached actions (for agent & act) now uses the originally-specified model instead of default when rerunning inference after failure"

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At .changeset/mean-melons-repeat.md, line 5: <comment>Changeset message describes the bug behavior rather than the fix. The description says it 'uses default model' but the actual fix makes cached replays use the *originally specified model* instead of the default. Consider rewording to clarify the fix, e.g., "fix: replaying cached actions (for agent & act) now uses the originally-specified model instead of default when rerunning inference after failure"</comment> <file context> @@ -0,0 +1,5 @@ +"@browserbasehq/stagehand": patch +--- + +fix: replaying cached actions (for agent & act) uses default model when action fails and rerunning inference is needed </file context>

✅ Addressed in 2240a0d

seanmcguire12 marked this pull request as ready for review January 5, 2026 23:05

cubic-dev-ai bot reviewed Jan 5, 2026

View reviewed changes

seanmcguire12 added 3 commits January 5, 2026 15:10

use correct model on cache replay failure

d05cccb

add unit tests

40a95be

update changeset for clarity

615c997

seanmcguire12 force-pushed the seanmcguire/stg-930-bug-replaying-cached-agent-actions-uses-default-model branch from 2240a0d to 615c997 Compare January 5, 2026 23:10

pirate approved these changes Jan 5, 2026

View reviewed changes

seanmcguire12 merged commit 088c4cc into main Jan 5, 2026
30 of 31 checks passed

This was referenced Jan 5, 2026

Version Packages #1479

Open

Version Packages CloudEngineHub/stagehand#1

Open

Version Packages natewong1313/stagehand#1

Merged

Version Packages mz0in/stagehand#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix]: use correct model on cache replay failure #1498

[fix]: use correct model on cache replay failure #1498

Uh oh!

seanmcguire12 commented Jan 5, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

changeset-bot bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

seanmcguire12 commented Jan 5, 2026

Uh oh!

greptile-apps bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Jan 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[fix]: use correct model on cache replay failure #1498

[fix]: use correct model on cache replay failure #1498

Uh oh!

Conversation

seanmcguire12 commented Jan 5, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

test plan

Summary by cubic

Uh oh!

changeset-bot bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

seanmcguire12 commented Jan 5, 2026

Uh oh!

greptile-apps bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seanmcguire12 commented Jan 5, 2026 •

edited by cubic-dev-ai bot

Loading

changeset-bot bot commented Jan 5, 2026 •

edited

Loading

greptile-apps bot commented Jan 5, 2026 •

edited

Loading

cubic-dev-ai bot Jan 5, 2026 •

edited

Loading