Add per-input trialCount support to Eval() by dflynn15 · Pull Request #1341 · braintrustdata/braintrust-sdk

dflynn15 · 2026-02-05T19:25:44Z

Why?

Braintrust's Eval() function supports a trialCount parameter that runs each input multiple times to measure variance in non-deterministic LLM outputs. However, this setting applies globally to all inputs, which creates some (but minimal) friction in some evaluation workflows. For example:

Targeted Debugging is Expensive: When investigating a single flaky test case, you want to run it 10-20 times to understand the variance pattern. With global trialCount, this means running your entire suite 10-20 times, multiplying costs and wait time unnecessarily.
Mixed Determinism is Common: Real evaluation suites contain a mix of deterministic scenarios (math problems, factual lookups) and non-deterministic ones (creative writing, open-ended reasoning). Forcing the same trial count on both wastes resources.
Cost Scales Linearly: Every additional trial means another LLM API call. A global trialCount: 5 on a 100-item dataset means 500 API calls, even if only 10 items actually need variance analysis.

In order to address this, we've created a custom solution that I want to propose as a contribution. Specifically it, allows each data item to specify its own trialCount, overriding the global default. This gives users fine-grained control over where to invest their evaluation budget.

What?

Eval("My Project", {
  data: [
    { input: "stable query", expected: "..." },                    // Uses global (3)
    { input: "flaky query", expected: "...", trialCount: 10 },     // Override to 10
    { input: "deterministic", expected: "...", trialCount: 1 },    // Override to 1
  ],
  task: myTask,
  scores: [Factuality],
  trialCount: 3, // Global default
});

There is a corollary Python PR up to match it here: #1342

Allow data items to specify their own trialCount, overriding the global evaluator setting. This enables targeted debugging of flaky test cases and mixed determinism scenarios without multiplying the entire suite. - Add optional `trialCount` field to `EvalCase` type - Per-item trialCount takes precedence over global trialCount - Items without trialCount use the global value (or 1 if unset)

dflynn15 mentioned this pull request Feb 5, 2026

Add per-input trial_count support to Eval() #1342

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-input trialCount support to Eval()#1341

Add per-input trialCount support to Eval()#1341
dflynn15 wants to merge 1 commit intobraintrustdata:mainfrom
dflynn15:feature/js-per-input-trial-count

dflynn15 commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dflynn15 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dflynn15 commented Feb 5, 2026 •

edited

Loading