Skip to content

Distributed Tracing for remote scorers#33

Merged
realark merged 3 commits intomainfrom
ark/vcr-otel
Feb 6, 2026
Merged

Distributed Tracing for remote scorers#33
realark merged 3 commits intomainfrom
ark/vcr-otel

Conversation

@realark
Copy link
Collaborator

@realark realark commented Feb 4, 2026

  • Use real braintrust api client in TestHarness (pointed at VCR)
  • do distributed traces for remote scorers
  • create one span per scorer in evals

produces eval traces like this:

https://www.braintrust.dev/app/braintrustdata.com/p/andrew-misc/trace?object_type=experiment&object_id=ff8cb35c-4e3a-4eb4-ad45-16ce97a9a3b8&r=0b3378817b514de1c5367fb9ba07c60c&s=d5042e3790a0e674

image

@realark realark added the enhancement New feature or request label Feb 4, 2026
@realark realark force-pushed the ark/vcr-otel branch 3 times, most recently from 907fb9c to 6ed7152 Compare February 5, 2026 19:06
@realark realark marked this pull request as ready for review February 6, 2026 01:40
@realark realark requested a review from delner February 6, 2026 01:45
@realark realark changed the title Distributed Tracing for remote LLM scorers Distributed Tracing for remote scorers Feb 6, 2026
Copy link

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally not a big fan of explicit tracing code in the eval scorer implementation, but will defer on this (not blocking.)

* @return parent object for distributed tracing, or null if tracing context not available
*/
@Nullable
private Map<String, Object> buildParentSpanComponents() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, having tracing in the Eval scorer implementation seems like a bit of a code smell... it's creating hard coupling between Evals and OpenTracing which can be used separately from one another.

I'd recommend decoupling this through some appropriate abstraction; perhaps dependency injection, composition or some other pattern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow? Evals and otel already have a coupling in the sense that the data created by evals is captured mostly through otel traces, but in terms of what appears in the public apis for evals this doesn't change anything.

https://github.com/braintrustdata/braintrust-sdk-java/blob/main/src/main/java/dev/braintrust/eval/Scorer.java

I could add an additional method allowing explicit parent info to be passed when scoring:

// something like this
List<Score> score(TaskResult<INPUT, OUTPUT> taskResult, ParentInfo parentInfo);

But I'm not sure what that buys us? In this case it would make the surface area of the public api a bit larger and would only be invoked by callers within otel traces.

It would make sense in the context of a larger refactor to decouple otel from evals though. That seems beyond the scope of this PR so I'll press on, but let's chat about it some time

@realark realark merged commit 1a2b623 into main Feb 6, 2026
1 check passed
@realark realark deleted the ark/vcr-otel branch February 6, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants