Rework span record benchmark and publish results by jack-berg · Pull Request #8031 · open-telemetry/opentelemetry-java

jack-berg · 2026-01-28T22:14:57Z

Followup to #8000

jack-berg · 2026-01-28T22:16:25Z

sdk/all/src/jmh/java/io/opentelemetry/sdk/BenchmarkUtils.java

+
+  /**
+   * The number of record operations per benchmark invocation. By using a constant across benchmarks
+   * of different signals, it's easier to compare benchmark results across signals.


If span, metric, and log benchmarks all record the same number of operations per benchmark invocation, we can see the relative cost of spans vs. logs. vs. metrics. Even though it will never be a perfect apples to apples comparison, its still useful to know the order of magnitude cost of the different signals.

Make sense?

jack-berg · 2026-01-28T22:18:28Z

sdk/all/src/jmh/java/io/opentelemetry/sdk/MetricRecordBenchmark.java

 /**
- * Notes on interpreting the data:
+ * This benchmark measures the performance of recording metrics and includes the following
+ * dimensions:


One of my initial concerns with public benchmarks was that they need to be contextualized.

To address this, I'd like to:

Put some effort into making the javadoc for our public benchmarks up to date and useful

Update the benchmark static webpage to link to the relevant javadoc for each benchmark

jack-berg · 2026-01-28T22:20:14Z

sdk/all/src/jmh/java/io/opentelemetry/sdk/SpanRecordBenchmark.java

+ * BatchSpanProcessor} paired with a noop {@link SpanExporter}. In order to avoid quickly outpacing
+ * the batch processor queue and dropping spans, the processor is configured with a queue size of
+ * {@link SpanRecordBenchmark#RECORDS_PER_INVOCATION} * {@link SpanRecordBenchmark#MAX_THREADS} and
+ * is flushed after each invocation.


This is a key aspect to a useful span record benchmark (and log record benchmark) IMO. We need to isolate from the export path, which is noisy due to the network dependency, while also being realistic. My definition of realistic is a batch span processor and a harness that makes sure that spans aren't just being dropped on the floor from a full queue.

jack-berg · 2026-01-28T22:21:55Z

sdk/all/src/jmh/java/io/opentelemetry/sdk/SpanRecordBenchmark.java

+    }
+  }
+
+  public enum SpanSize {


Check this out: if we have individual parameters for the num attributes, num events, num links, we end up with combinatorial explosion and a lot of noise. What we really want to characterize is the performance of different sizes of spans, where a size is a composite of a variety of dimensions.

codecov · 2026-01-28T22:31:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.21%. Comparing base (87b0d9a) to head (88f623a).
⚠️ Report is 60 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #8031      +/-   ##
============================================
+ Coverage     90.16%   90.21%   +0.04%     
- Complexity     7484     7606     +122     
============================================
  Files           836      841       +5     
  Lines         22562    22923     +361     
  Branches       2237     2291      +54     
============================================
+ Hits          20344    20680     +336     
- Misses         1515     1526      +11     
- Partials        703      717      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

trask · 2026-02-16T03:37:05Z

sdk/all/src/jmh/java/io/opentelemetry/sdk/SpanRecordBenchmark.java

+        span.setAttribute(
+            benchmarkState.attributeKeys.get(j), benchmarkState.attributeValues.get(j));
+      }
+      for (int j = 0; j < benchmarkState.exceptions.size(); j++) {
+        span.recordException(benchmarkState.exceptions.get(j));
+      }
+      for (int j = 0; j < benchmarkState.linkContexts.size(); j++) {
+        span.addLink(benchmarkState.linkContexts.get(j));
+      }


might be interesting to also benchmark adding attributes and links to the span builder

Do you expect perf differences between adding to the SpanBuilder vs. Span? I can imagine yes if spans are sampled out based on data recorded to SpanBuilder.

Do you have any thoughts for how I might include a dimension for that while avoiding an explosion of test cases / keeping things easy to understand. I like the current framing of a "span size" dimension with t-shirt size options (small, medium, large).

Do you expect perf differences between adding to the SpanBuilder vs. Span?

i'm not sure, maybe run locally to see, and if they seem similar then no need to add it

Switched the benchmark from recording attributes / links on Span to recording on SpanBuilder:

SpanBuilder spanBuilder = benchmarkState.tracer.spanBuilder("test span name"); for (int j = 0; j < benchmarkState.attributeKeys.size(); j++) { spanBuilder.setAttribute( benchmarkState.attributeKeys.get(j), benchmarkState.attributeValues.get(j)); } for (int j = 0; j < benchmarkState.linkContexts.size(); j++) { spanBuilder.addLink(benchmarkState.linkContexts.get(j)); } Span span = spanBuilder.startSpan(); for (int j = 0; j < benchmarkState.exceptions.size(); j++) { span.recordException(benchmarkState.exceptions.get(j)); } span.end();

Change in benchmark:

Benchmark Span Size Baseline (ops/s) With Span Builder Record (ops/s) Difference Change %

threads1 SMALL 8,035,454 8,154,087 +118,633 +1.48%

threads1 MEDIUM 1,296,857 1,282,624 -14,233 -1.10%

threads1 LARGE 108,910 108,768 -143 -0.13%

threads4 SMALL 9,581,626 10,853,479 +1,271,853 +13.27%

threads4 MEDIUM 3,401,908 2,708,558 -693,350 -20.38%

threads4 LARGE 440,806 355,208 -85,598 -19.41%

To me, the difference between recording on Span vs. SpanBuilder looks like interrun variance.

👍 no need to explode the matrix

sdk/all/src/jmh/java/io/opentelemetry/sdk/SpanRecordBenchmark.java

jack-berg · 2026-02-17T17:35:55Z

sdk/all/src/jmh/java/io/opentelemetry/sdk/MetricRecordBenchmark.java

  @Measurement(iterations = 5, time = 1)
-  public void record_4Threads(ThreadState threadState) {
-    record(threadState);
+  @OperationsPerInvocation(RECORDS_PER_INVOCATION)


Adding this annotation is going to render the historic benchmark results useless. Options:

Wipe the history after merging this

Manually adjust the historic results to align with the new config

good point, i'm good with any approach, including not adding the annotation

I do like the annotation. The output figures seem insanely low on first glance until you read the fine print and understand that each operation is actually many.

Rework span record benchmark and publish results

5e2b5da

jack-berg requested a review from a team as a code owner January 28, 2026 22:14

jack-berg commented Jan 28, 2026

View reviewed changes

Fix build

fb8f290

trask approved these changes Feb 16, 2026

View reviewed changes

Add OperationsPerInvocation annotation

88f623a

jack-berg commented Feb 17, 2026

View reviewed changes

jack-berg merged commit bf3c4f3 into open-telemetry:main Feb 17, 2026
26 of 27 checks passed

This was referenced Feb 18, 2026

Normalize historic benchmark data after adding OperationsPerInvocation annotation #8100

Merged

Add LogRecordBenchmark #8106

Open

Benchmark	Span Size	Baseline (ops/s)	With Span Builder Record (ops/s)	Difference	Change %
threads1	SMALL	8,035,454	8,154,087	+118,633	+1.48%
threads1	MEDIUM	1,296,857	1,282,624	-14,233	-1.10%
threads1	LARGE	108,910	108,768	-143	-0.13%
threads4	SMALL	9,581,626	10,853,479	+1,271,853	+13.27%
threads4	MEDIUM	3,401,908	2,708,558	-693,350	-20.38%
threads4	LARGE	440,806	355,208	-85,598	-19.41%

Comments

Conversation

jack-berg commented Jan 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 28, 2026 •

edited

Loading