Benchmark Runner QoL Improvements by NotsoanoNimus · Pull Request #2672 · c3lang/c3c

NotsoanoNimus · 2025-12-22T20:40:07Z

I work with compile-benchmark quite often - only fair I help with its upkeep. Slapped this together in a couple hours, because I really need more consistent results and a way to visualize them, and the benchmark runner is a bit of a rat's nest.

Changes, in no particular order:

Add a MEDIAN metric to the results
- The mean gives us throughput information, but it's too heavily skewed by a set's outliers.
- The median gives us an idea of performance expectations with a 50% probability. Put another way, the tested function has performed at or better than the median in 50% of samples taken.
- Caveat: a sorted set is required. For high-volume iterations, this can get expensive.
Improve the output units and refactor that into NanoDuration, so it can be used elsewhere as desired
Provide some pretty colors, oooooo
Added CSV reporting option to get a resultant data-set
- Not the most useful thing, but makes it easy to plug benchmark outputs into other software. Could probably be improved upon. _{and yeah I'm a boomer who likes CSV, sue me}
Restructure the benchmark runtime to be more like the tester runtime
Fixed a divide-by-zero crash when the benchmark iteration count was <100
Rudimentary command-line options to adjust benchmark options

BWindey · 2025-12-22T20:58:38Z

While trying to review I tried running the code but got an error:

Program linked to executable './benchmarkrun'.
Launching ./benchmarkrun --benchmark-no-median
--------- BENCHMARKS ---------

ERROR: 'Negative array indexing (index was -2147483648)'
  in std.time.NanoDuration.unit_str (~/.../lib/std/time/time.c3:129) [~/tmp/benchmarkrun]
  in std.time.NanoDuration.tunit_str (~/.../lib/std/time/time.c3:136) [~/tmp/benchmarkrun]
  in std.core.runtime.run_benchmarks (~/.../lib/std/core/runtime_benchmark.c3:231) [~/tmp/benchmarkrun]
  in std.core.runtime.default_benchmark_runner (~/.../lib/std/core/runtime_benchmark.c3:279) [~/tmp/benchmarkrun]


Program completed with exit code 132.

The file:

fn void foo() @benchmark {
	String s = string::format(mem, "Hello %s", "World");
	free(s);
}

There appears to have broken something with timing as I'm getting negative values for cpu-time.

I applied my suggestion that I gave in the reviews below for the NanoDuration.unit_str() method and I'll paste the output of running the benchmarks with the code from this PR and the one from current main:

$ c3c compile-benchmark test.c3
Program linked to executable './benchmarkrun'.
Launching ./benchmarkrun --benchmark-no-median
--------- BENCHMARKS ---------
Benchmarking test::foo ..  -9223372036854775808.00ns,       -nan CPU clocks, 0 iters in 0.00ns

1 benchmark run.

Program completed with exit code 0.

$ c3vm use --from-source -- compile-benchmark test.c3
Program linked to executable './benchmarkrun'.
Launching ./benchmarkrun
--------- BENCHMARKS ---------
Benchmarking test::foo .. [COMPLETE] 1.26 microseconds, 4042.42 CPU clocks, 10000 iterations (
runtime 12.55 milliseconds)

1 benchmark run.

Program completed with exit code 0.

benchmarks/stdlib/hash/ripemd.c3

benchmarks/stdlib/hash/streebog.c3

lib/std/core/runtime_benchmark.c3

lib/std/time/time.c3

data-man · 2025-12-23T16:11:37Z

Caveat: a sorted set is required. For high-volume iterations, this can get expensive.

Sorry, as the author of initial implementation of benchmarking, I strongly object to this change.

BWindey · 2025-12-23T16:13:04Z

Caveat: a sorted set is required. For high-volume iterations, this can get expensive.

Sorry, as the author of initial implementation of benchmarking, I strongly object to this change.

Can you provide info why?

NotsoanoNimus · 2025-12-23T16:21:45Z

Sorry, as the author of initial implementation of benchmarking, I strongly object to this change.

I respect your opinion about this, but a median is necessary for the reasons I detailed in the initial PR. The call to quicksort happens after all samples are already taken, if and only if the median report will be given.

If you don't want to use a median, it's disabled by default. You can also turn it off permanently for your bench programmatically in a bench_setup type @init function, or there's a command-line argument to turn it off (both for c3c and the bench executable).

lerno · 2026-01-13T22:07:15Z

@data-man would you like to explain why?

data-man · 2026-01-14T01:01:16Z

@data-man would you like to explain why?

It's already 6 a.m. for me, I'll write tomorrow if possible.

data-man · 2026-01-15T08:12:54Z

So, here are some thoughts:

median is slightly useless because it's always very close to average.
For example $ c3c --benchmark-median=yes compile-benchmark chacha20.c3:

-------------------------- BENCHMARKS ---------------------------
Benchmarking chacha20_benchmarks::gogo_chacha20 ............   42.55ms avg,  42.22ms med, 119474064.00 CPU clocks, 1024 iters in 43.57s
Benchmarking chacha20_benchmarks::gogo_chacha20_unaligned ..   42.86ms avg,  42.54ms med, 120355856.00 CPU clocks, 1024 iters in 43.89s

and yeah I'm a boomer who likes CSV, sue me

I think most users would prefer Markdown. But ok, MD support can be added later.

Restructure the benchmark runtime to be more like the tester runtime

The compiler does not need to be patched for this. I propose to add bench_config struct and set_benchmark_config function. And DEFAULT_BENCHMARK_CONFIG const, of course, with warmup_iterations and max_iterations fields. This is a more flexible approach in the long term.

Do you know about this switch --benchfn <name> - Override the benchmark runner function name.?

lerno · 2026-01-22T21:22:37Z

Can we have a follow up to those concerns @NotsoanoNimus ?

NotsoanoNimus · 2026-01-23T02:10:23Z

[...] median is slightly useless because it's always very close to average.
For example $ c3c --benchmark-median=yes compile-benchmark chacha20.c3:

Perhaps "slightly" useless, but not entirely so. The fact of the matter is that a median metric from the runtime will objectively pardon its results from any significant outliers, which will skew an average value. I don't see why a metric which describes "at least 50% of runs hit this performance mark" isn't useful.

I think most users would prefer Markdown. But ok, MD support can be added later.

Totally agree with this. CSV was a quality-of-life, might-as-well addition to allow at least some automated parsing of runtime results if one so desires.

Given that a Markdown table format is just as simple as combining commas a la CSV, I would have neither difficulty nor objection to adding this in as another option.

The compiler does not need to be patched for this. I propose to add bench_config struct and set_benchmark_config function. And DEFAULT_BENCHMARK_CONFIG const, of course, with warmup_iterations and max_iterations fields. This is a more flexible approach in the long term.

This isn't a bad idea, and I wouldn't have a problem changing this around some more to accommodate it. A struct-splat with DEFAULT_BENCHMARK_CONFIG would make it very easy to change runtime options for the runner indeed.

I'd have to double-check that nothing is required before the runtime gets to an @init method, but this should be fine.

[...] Do you know about this switch --benchfn <name> - Override the benchmark runner function name.?

For sure. Rather than create a new one or maintain my own, I thought it better to show some love to the current runtime without changing what it actually does too drastically (despite shifting inner its workings).

Thanks for your honest feedback, @data-man. Ultimately, my heart won't be broken if this PR isn't merged. But before that's determined, let me run this TODO list back and make some changes for you to review further.

lerno · 2026-01-25T15:14:09Z

Should this be a draft until you update?

lerno · 2026-02-20T20:22:45Z

What's the current status?

data-man · 2026-02-20T20:31:59Z

@lerno, I'd like to know your opinion about a progress bar. Is it really necessary?

lerno · 2026-02-22T10:04:24Z

To me a progress bar is not necessary, esp given that it can interfere with measurements. I think the final list is the most important.

NotsoanoNimus · 2026-02-22T15:55:15Z

To me a progress bar is not necessary, esp given that it can interfere with measurements. I think the final list is the most important.

The progress bar does not affect the sampling of each run, at least not explicitly (meaning there might be some other cache/performance aspect outside of the code).

Regardless, I don't think I'm going to continue working on these improvements at this time anyhow: the further changes I've made have deviated so far from the original PR that it's not really sensible to push without opening another one sometime in the future. And I don't really have time to iterate on this right now.

~~I will open a separate PR just to patch the bug mentioned in the PR description.~~ Nevermind, this was already fixed separately by @ManuLinares. 👍

lerno · 2026-02-23T02:06:55Z

Wait, wasn't this more than the progress bar? There's already one progress bar that we have. It's the one that is not necessary for me, although I can understand that seeing it might be nice. Just a spinner might be sufficient though. And didn't this one also have a CSV output?

NotsoanoNimus added 2 commits December 22, 2025 13:56

Improve Benchmarking Outputs

68c6906

update message and release notes

777ca46

BWindey reviewed Dec 22, 2025

View reviewed changes

NotsoanoNimus added 3 commits December 23, 2025 15:23

incorporate review feedback by @BWindey

61df046

missed a CRLF -> LF

b7604e8

strip back unnecessary added 'math' import in time.c3

5d013cf

lerno added 3 commits December 29, 2025 17:02

Merge branch 'master' into benchmark-median

50b1675

Merge branch 'master' into benchmark-median

65f8e0c

Merge branch 'master' into benchmark-median

c34c5e2

Merge branch 'master' into benchmark-median

a818add

NotsoanoNimus marked this pull request as draft January 26, 2026 11:41

lerno added the Pending Updates label Feb 7, 2026

NotsoanoNimus closed this Feb 22, 2026

lerno reopened this Feb 23, 2026

Uh oh!

Conversation

NotsoanoNimus commented Dec 22, 2025

Uh oh!

BWindey commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

data-man commented Dec 23, 2025

Uh oh!

BWindey commented Dec 23, 2025

Uh oh!

NotsoanoNimus commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lerno commented Jan 13, 2026

Uh oh!

data-man commented Jan 14, 2026

Uh oh!

data-man commented Jan 15, 2026

Uh oh!

lerno commented Jan 22, 2026

Uh oh!

NotsoanoNimus commented Jan 23, 2026

Uh oh!

lerno commented Jan 25, 2026

Uh oh!

lerno commented Feb 20, 2026

Uh oh!

data-man commented Feb 20, 2026

Uh oh!

lerno commented Feb 22, 2026

Uh oh!

NotsoanoNimus commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lerno commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BWindey commented Dec 22, 2025 •

edited

Loading

NotsoanoNimus commented Dec 23, 2025 •

edited

Loading

NotsoanoNimus commented Feb 22, 2026 •

edited

Loading